|
|
|
|
|
by pkhuong
757 days ago
|
|
I'd expect a comparison with compact (or even succinct) constructions for arbitrary functions, like MWHC. Section 3.2 of https://vigna.di.unimi.it/ftp/papers/TheoryPracticeMonotone.... has a good overview. Given a set S of arbitrary hashable values, it's possible to represent a function from S to r bits in |S|r + o(|S|) bits (keys outside S are mapped to random r-bit values). More practical construction hit ~1.23 |S|r, or even |S|(r + 1.23) bits. It should also be faster to evaluate than `r` bloom filter lookups for large datasets. I think the main advantage of the bloom filter (or compressed bitmap) approach is it can be updated incrementally. MWHC-style representations are better suited to build once / read many workloads. |
|
In contrast, a B-field lets you map a key to an arbitrary number of (typically non-unique) values. So I could map a million elements to "1", another million to "2", etc.
I'm not especially current (or fluent!) in that literature though, so would love pointers to anything that doesn't have the above constraints.