Hacker News new | ask | show | jobs
by planede 814 days ago
For dealing with larger data I would probably split the input bits into 63 bit chunks, which can be encoded in 19 decimal digits. 63 input bits turn into 19 digits which in turn is encoded in 63.33... output bits on average. This has an overhead of 0.53% instead of 0.34% of pure base10, which I think is acceptable. But then you don't have to bring bignum libraries into the picture, as each chunk fits into a 64bit integral type.

64bit chunks are a little bit worse, with 4.16% overhead, so it might be worth dealing with the little complexity of 63 bit chunks.

I would also output the decimal digits in little-endian order.

edit: If you are willing to go for larger chunks then 93bit chunks would be my next candidate, there the overhead is 0.36%, barely more than pure base10's 0.34%. I don't think it's worth going any higher.

1 comments

127 bit integers get 38 digits, which lines up well with 128 bit integers
No, they get 39 decimal digits (1.7e38 is 39 decimal digits). 127bit chunks would get you 2.36% overhead, which is not bad. However at 93bit chunks can (barely) be encoded in 28 digits (2^93 ~= 9.9e27) and it's more efficient at around 0.36% overhead. So once you have 128 bit arithmetic, it's still not worth using all or most of those bits per chunk, 93bit chunks is the most efficient under 128 bits.
aye, I was getting mixed up based on recently seeing what Snowflake allowed for storing decimal in binary, as opposed to this strange case of binary stored in decimal