Dude, that's horrific. I figured some secure coders wouldve at least implemented a better JSON one by now since it's relatively simple. Or are they already available but dev's often rely on these broken ones?
Hash functions designed for hash tables are generally not hard to find collisions for, so there's not much that can be done. You could shoehorn in a secure hash function, but that would hurt performance.
Having per-table random seeds and properly designed hash function prevents remotely forced collisions. This is the typical defense against hashtable DOS.
It has to be done properly, of course -- for example, a poorly designed hash function could have characteristic collisions for many different seeds.
Please be aware that many common hash functions are easy to generate collisions for independent of the seed. This includes both MurmurHash and CityHash. These aren't poorly-designed hash functions: they have excellent distribution characteristics. But the way they use their seed wasn't designed to resist this kind of attack.
Rather than "poorly-designed", I should have said "cryptographically insecure".
Most hash functions are engineered for speed and collision resistance, in that order. Trading collision resistance for speed is worthwhile for many workloads, since it barely affects the average case.
> Most hash functions are engineered for speed and collision resistance, in that order.
I disagree with that assessment. I think most hash functions are designed for speed and good distribution of outputs given normal inputs. But designing to resist collisions against someone trying to deliberately create them is a different thing entirely.
> Trading collision resistance for speed is worthwhile for many workloads, since it barely affects the average case.
I think that is far from established. SipHash is marketed under this premise, but from what I have heard it is significantly slower, particularly for short inputs.
Yes, speed for normal inputs is what's generally desired.
A faster hash function can yield better overall performance than having fewer collisions. Here's some empirical evidence for this: https://www.strchr.com/hash_functions
The speed is correlated only weakly with the number of collisions-- and using the modern x86 CRC32 instruction yields the best results.
I was thinking more on lines of (a) does JSON handling need hash tables and (b) if do, do they have to be open addressing or something vulnerable to DOS? A No on either can lead to an implementation with better DOS resistance.
For B) All hash table implementations are vulnerable to this kind of attack. However, some kinds of chaining are less vulnerable. In particular, balanced binary tree chaining would make the worst-case lookup (and insertion?) time complexity O(log n), which is a significant improvement over probing or linked list chaining. The tricks mentioned in the other comments above also improve things, by making such an event less likely.
As for A), no. No, JSON handling doesn't need hashtables. Since your app will only look at certain values in the JSON, you can simply ignore all the other values, and dump the values in an object/struct of your choosing. It wouldn't even be all that hard to write, provided you know how to write a parser...
Since your app will only look at certain values in the JSON, you can simply ignore all the other values, and dump the values in an object/struct of your choosing. It wouldn't even be all that hard to write, provided you know how to write a parser..."
That's what I was thinking. Never seen verified parsers or generators requiring these things. So I figured it was an unnecessary requirement bringing in its own security issue.
A large part of the problem is that people want to be able to "just" de-serialize a chunk of JSON into a suitable generic structure they can then "just" de-reference as a suitable tree of dictionaries and arrays. It'd be fantastic to see easier-to-use patterns to discourage resorting to this, but it's very hard to beat for simplicity.