Hacker News new | ask | show | jobs
by fxaguessy 3413 days ago
(I'm one of the core developpers of awless)

The hash functions are totally unrevertable, so it is impossible to come back to the original identifiers.

We added these anonymous ids, in order to know which commands are the most used per users.

Anyway, if you have better ideas on how to manage this, feel free to make a pull request or create a Github issue. And if you prefer to disable it, you can also do it easily with the source code (you just need to comment a few lines).

Edit: We opened an issue for this topic on our Github repo: https://github.com/wallix/awless/issues/38 . Feel free to continue the discussion there.

3 comments

You don't need to break SHA256 to de-anonymize these values.

`awless` collects account number hashes. AWS account numbers are 12 decimal digits long, meaning there's a total of 10^12 unique values. Values are anonymized before submission using a single round of SHA256, so in ~2^40 hash operations, anyone with your database of hashes can invert every single account number.

For comparison, the bitcoin blockchain presently has a hash rate of ~2^61 SHA256 hashes per second. (Edit: I incorrectly stated 2^41 based on a hash rate of 3 TH/s, when it's actually 3 million TH/s.)

On my not-so-special spare server, I'm able to pregenerate the hashes with that fixed salt at 344,191 per second. So, it would take only about a month to compute them for every 12 digit AWS account number. And, as mentioned, that's on my not-so-fast spare server, running in one process, one thread.

acct [000003441910] has hash [d2a52833a6e434d2a55be0ce852c2dd9c5260c49a7c28ea4fa3fe2ac6d054d7e] (the last one it finished in 10 seconds)

A little effort with a decent GPU + hashcat though, would take this exercise down to a few minutes.

Good point. Thanks for the advice, we will study quickly how we can improve this. Our goal is above all to make the usage of AWS easier, and as a result, more secure. We do not want to expose the CLI users to any new threat. We made the source code available to anyone (even the anonymous data collection), to be transparent and get feedback on our work to correct it when needed.
I opened an issue:

https://github.com/wallix/awless/issues/39

PBKDF2, bcrypt, and scrypt are all used where a database needs to store something and check for equality, but where the values in the database need to not be reversible even if the database is breached. They might be suitable here.

None of those can deal with the case of having too limited of an input range. Even if you use a million rounds, you've only added 2^20 to the workload.
Different algo, but my 970 can perform 3.4 billon SHA1 hashes per second on the low setting in hashcat
You can create a randomly generated cookie of sorts instead of doing anything with a users' credentials. The supposed accomplished task and end goal would be the same, and yet, people would feel more comfortable.

Your claim that you are using an irreversible hash is not comforting.

Your forced data collection is also not comforting.

> You can create a randomly generated cookie of sorts instead of doing anything with a users' credentials.

That throws off their statistical analysis. Random cookies generates a new cookie for each new install or re-install, inflating the "users" count. If someone installs this on five different servers, the stats under random cookies will show five separate streams of data, and they will draw improper conclusions that a particular operation used on all of those servers if five times more popular than it really is. A configuration flag to disable the data collection is reasonable, but using a well-known hash like Whirlpool to anonymize the data stream is also reasonable.

If someone doesn't like data collection, then they shouldn't use cloud products, and they should just as vociferously declaim cloud services. With cloud services, whether or not the usage data collection is anonymized is at vendor discretion, but here, you control the source. Using a utility for a cloud service, and complaining about usage data collection, is ironic, considering AWS surely collects the same data.

> AWS surely collects the same data

Well of course they do, since all of these commands send off calls to AWS servers. And is you're using AWS products you already trust Amazon, that doesn't mean you trust a random person who put some code on Github.

This whole mess should be opt-in, but it's shocking that anyone thought uploading account IDs hashed with known salts was a good idea. How long did it take you to generate the rainbow table? What you did was more difficult than simply generating a random string as you should have done.