Hacker News new | ask | show | jobs
by Reubensson 1515 days ago
Isn't key derivation function completely separate from aes implementation. I mean you could have used the same broken key derivation with some other aes implementation.

Also aes-1024 sounds like some proprietary thingy, not something people should probably trust anyway...

5 comments

It's definitely proprietary. In the summary below the video, it states:

> It turned out that the key derivation function was PBKDF2 using 1000 iteration of MD5 to derive the encryption key. The salt used to derive the keys is constant and hardcoded in all the solutions and all the vendors. This makes it easier for an attacker to guess the user password of a vault using time/memory tradeoff attack techniques such as rainbow tables and to re-use the tables to retrieve passwords for all users using the software. The implementation itself was incorrect and even with a randomly generated unique salt, it would be effortless to recover the password of a user.

I'd stick with veracrypt for now.

Agreed: 'Breaking the DataVault encryption software' would be a better title
Any time I see encryption described as "military grade" it usually sets of my bullshit detector.
Maybe it was Russian military grade.
The complaint is that the user supplied password is easier to guess than it could be. A fast hash is used and not very many times. So you might have to, say, use 5 words rather than, say, 3 words in your diceware generated passphrase if you want to be secure against brute force attacks.

This ends up being a common usability issue whenever a user is asked to provide a passphrase for some sort of symmetrical encryption scheme. The user is almost never given any guidance to allow them to chose a passphrase strong enough for the system in use. So they end up with a dictionary word with a digit on the end and have no way to know that they have not actually protected anything. It ends up being sort of a con in practice. The user is allowed to believe that the system is much more convenient to use than it actually is.

The system under consideration is not really any worse than other things in this.

I mean it's worse from several points of view from the best-case implementation, other than just the grossly-insufficient iteration count.

First of all, every single devices shares a common, constant salt. Although that doesn't simplify a single instance of an attack on the key-derivation function, it permits the construction of rainbow tables to trade off storage against time and attack all devices simultaneously.

Secondly, the key-derivation function has a flawed construction. In PBKDF2, if you need more key bits than the hash function outputs, you repeat the process varying a value that's concatenated to the salt as the initial input. This means the cost to generate a key to test scales with the length of the key. In this implementation, the salt + constant is not actually an input in the hash function at all, but instead XOR'd against the result of the hash.

As a result, you get no increase in work factor at all.

There are other problems in the implementation of the actual cipher (AES-CTR is malleable, not authenticated; keys longer than 128 bits are only actually using half of the additional key bits - "AES-1024" only uses 576 bits of the generated key) but the KDF is the real problem.

> So they end up with a dictionary word with a digit on the end and have no way to know that they have not actually protected anything

a dictionary word with common letters substituted with a number, case-sensitive, and one or two punctuation.. that is "not protected anything" ? .. almost any two dictionary words put together, not even case sensitive also "not protected anything" ? the out-of-breath security analysis is bothersome and lead us to mandatory ten characters of garble and other extreme anti-user patterns.. I am looking at a stack of forty accounts with passwords as an ordinary library user.. not convinced of this expert analysis today

If your threat model is "someone cloned the database and can now perform unlimited attacks against the stored passwords", then yeah, word + digit protects just about nothing. Assuming a lexicon of 5,000 words, word+digit gives you about 50,000 variations to try. Say that L337 substitutions give you another 10x factor, so now you have 500,000 candidates for what the password might be. Now lets assume that instead of the stupid crap they did in this video, the folks storing your password did everything right and used bcrypt with a work factor of 12. A cracking rig from a couple years ago can run something like 10,000 hashes per second under these conditions, so it might take a whole minute to discover your password. (Remember this is if they did it right, most other password storage schemes would yield your password in a fraction of a second.)

Or, we could look at the two-words-separated-by-punctuation case. Same 5,000 word lexicon, maybe 10 different symbols likely to show up between the words. Call that ~250,000,000 possibilities for your password. That'll take up to a day to crack. A day is a long time to spend on one password, but maybe they don't have anything better to do. Maybe they hate you personally. Add another word, suddenly the hackers need years per password, which is obviously uneconomical.

These guidelines don't come out of nowhere, and there isn't really a tower of experts somewhere giggling at the unwashed idiots around them (well, there might be, but I wasn't invited). This is just one of many problems in computing that live around the intersection of math and psychology, where the "natural" thing to do is (unintuitively) quite dangerous.

the Oxford English dictionary has +200,000 words. Split the difference between your 5000 and that 200,000 and call it 100,000 word possibilities eliminating the 1,2 or three letter ones, case-sensitive with your own case rule .. (capital-S in the middle is fair).

each word, of say at least four characters.. with a "simple substitution of a letter with a number" .. which number? 10x per substitution..

add one or two special characters.. how many special characters are there? lets say numerals plus at least 16 more (counts key caps).. one or two adds means .. up to (10+16) squared more combinations

two words.. square that again? what am I missing? a brute force attack on that many combinations better include the right set of special characters.. because you will never match if you do not have the right set of characters in your brute force, right?

now, "10,000 hashes per second under these conditions" means you have hashed the guess, and compare to the hashed stored answer.. sure that is fast.. maybe you can do it, but did you say that you have a copy of the database and can run constant, undetected brute force in private for "forever" ? is that common? specific answers welcome

As far as I can see: yes and yes. Clickbait title.