| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Asmod4n 98 days ago
	The cost of using a textual format is that floats become so slow to parse, that it’s a factor of over 14 times slower than parsing a normal integer. Even with the fastest simd algos we have right now.

3 comments

HelloNurse 98 days ago

So it depends. Float parsing performance is only a problem if you parse many floats, and lazy access might reduce work significantly (or add overhead: it depends).

link

creationix 98 days ago

Exactly. My for use cases, this format is amazing. I have very few floats, but lots and lots of objects, arrays and strings with moderate levels of duplication and substring duplication. My data is produced in a build and then read in thousands or millions of tiny queries that lookup up a single value deep inside the structure.

rx works very well as a kind of embedded database like sqlite, but completely unstructured like JSON.

Also I'm working on an extension that makes it mutable using append-only persistent data structures with a fixed-block caching level that is actually a pretty good database.

link

creationix 98 days ago

if you data is lots and lots of arrays of floats, this is likely not the format for you. Use float arrays.

Also note it stores decimal in a very compact encoding (two varints for base and power of 10)

That said, while this is a text format, it is also technically binary safe and could be extended with a new type tag to contain binary data if desired.

link

meehai 98 days ago

and with little data (i.e. <10Mb), this matters much less than accessibility and easy understanding of the data using a simple text editor or jq in the terminal + some filters.

link

xxs 98 days ago

what do you mean by little data, most communication protocols are not one off

link

creationix 98 days ago

Also good luck parsing 10 MiB of JSON in a loop that can't tolerate blocking the CPU for more than 10ms.

What's expensive is very relative to the use case.

link