| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by xienze 232 days ago
	> I don't know what kind of data you are dealing with but its illogical and against all best practices to have this many keys in a single object. The whole point of this project is to handle efficiently parsing "huge" JSON documents. If 65K keys is considered outrageously large, surely you can make do with a regular JSON parser.

1 comments

pshirshov 232 days ago

> If 65K keys is considered outrageously large

You can split it yourself. If you can't, replace Shorts with Ints in the implementation and it would just work, but I would be very happy to know your usecase.

Just bumping the pointer size to cover relatively rare usecases is wasteful. It can be partially mitigated with more tags and tricks, but it still would be wasteful. A tiny chunking layer is easy to implement and I don't see any downsides in that.

link

cogman10 232 days ago

How wasteful?

Presumably 4 bytes dedicated to the keys would be dwarfed by any strings thrown into the dataset.

Regardless, other than complexity, would there be any reason to not support a dynamic key size? You could dedicate the first 2 bits on the key to the length of the key. 1 byte would work if there's only 64 keys, 2 bytes would give you 16k keys and 3 4M. And if you wanted to you could use a frequency table to order the pointers such that more frequently used keys are smaller values in the dictionary.

link

pshirshov 232 days ago

Most of the data the library originally was written for consists of small objects and arrays with high levels of duplication (think state of the world in a videogame with tons of slightly varying objects). Pointer sizes really matter.

link