| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wood_spirit 957 days ago

My own lessons from writing fast json parsers has a lot of language-type things but here are some generalisations:

Avoid heap allocations in tokenising. Have a tokeniser that is a function that returns a stack-allocated struct or an int64 token that is a packed field describing the start, length and type offsets etc of the token.

Avoid heap allocations in parsing: support a getString(key String) type interface for clients that what to chop up a buffer.

For deserialising to object where you know the fields at compile time, generally generate a switch case of key length before comparing string values.

My experience in data pipelines that process lots of json is that choice of json library can be a 3-10x performance difference and that all the main parsers want to allocate objects.

If the classes you are serialising or deserialising is known at compile time then Jackson Java does a good job but you can get a 2x boost with careful coding and profiling.

Whereas if you are paying aribrary json then all the mainstream parsers want to do lots of allocations that a more intrusive parser that you write yourself can avoid, and that you can make massive performance wins if you are processing thousands or millions of objects per second.