Hacker News new | ask | show | jobs
by rurban 375 days ago
My special use case is i.e. unicode property checks, for which gperf is not big enough, and which has millions of keys. Other cases are integer keys.

I'm certainly not willing to load they keys and mpfh properties at query-time from disc, as they are known in advance and can be compiled to C or C++ code in advance, which leads to an instant load-time, in opposition to your costly deserialization times in all your tools.

Your deserialization overhead space is not calculated, and also not the storage costs for the false positive check. It's rather academic, not practical

1 comments

There is no deserialization time or space overhead. The measurements refer to the deserialized form. They are not loaded from disk.

About false positive checks, I think you misunderstand what a perfect hash function does.

See, everybody can see how you cheat your benchmarks. It's unrealistic to measure only the perfect hash function, when you discard the cost of the deserialization and false-positive checks.

That's always the case when dealing with you academics.

We dont care how you define your costs when we laugh about that. In reality you have to load the keys, the data structures and check for existance. You even discard the existance checks, by omitting the false-positive checks. Checking against a non-existing key will lead to a hit in your check. Only very rarely you know in advance if your key is in the set

Also, you miss the space costs for the ordering.