I haven't got a full comprehension of 1 and why it would compress a lot better. It would be interesting to take a look.
I did consider perfect hashing, however IIUC it will map a fixed set of entries to sequential integers, but it won't do something clever for items not in the set, so I couldn't find a way to make it work well for this problem. I did see someone talking about perfect bloom filters but to be honest I didn't quite understand it https://iswsa.acm.org/bloom.html
I did consider perfect hashing, however IIUC it will map a fixed set of entries to sequential integers, but it won't do something clever for items not in the set, so I couldn't find a way to make it work well for this problem. I did see someone talking about perfect bloom filters but to be honest I didn't quite understand it https://iswsa.acm.org/bloom.html