Hacker News new | ask | show | jobs
by jstanley 2349 days ago
Calculating SHA256 is neither O(1) nor cheap.

Your idea is interesting and might be useful in some applications! But calculating a SHA256 takes time linear in the length of the thing you're hashing. Even if you assume fixed-size keys, SHA256 isn't fast. It's not designed to be fast.

You'd likely need an extremely high N before O(N) SHA256's is faster than O(N log N) comparisons.

2 comments

> But calculating a SHA256 takes time linear in the length of the thing you're hashing.

Ahem, so does hashing.

And so does sort comparison.

The GP's idea is actually quite effective for medium-to-large objects.

As a bonus, you can cache and store the SHA256 of each object for future in-set testing and uniqueness-finding without looking at the objects themselves. (You cannot do this with non-cryptographic hashes or sorting by comparisons.)

This is basically what Git does.

Well SHA256 is a red herring here: one could suggest any suitable fast hash, non-crytopgraphic hash instead.

Yes, it takes time linear in the length of the thing you are sorting, but comparison sort is generally worse in that respect: comparison also takes up to linear time in the comprands, and you have to compare each element not just once but log n times [1].

So the time to hash all the elements once (if the hash is not already cached in the element) is going to be a small part of the sort time.

[1] There are ways around repeatedly re-comparing the entire key, but they work mostly in specialized cases.