Hacker News new | ask | show | jobs
by Asmod4n 98 days ago
The cost of using a textual format is that floats become so slow to parse, that it’s a factor of over 14 times slower than parsing a normal integer. Even with the fastest simd algos we have right now.
3 comments

So it depends. Float parsing performance is only a problem if you parse many floats, and lazy access might reduce work significantly (or add overhead: it depends).
Exactly. My for use cases, this format is amazing. I have very few floats, but lots and lots of objects, arrays and strings with moderate levels of duplication and substring duplication. My data is produced in a build and then read in thousands or millions of tiny queries that lookup up a single value deep inside the structure.

rx works very well as a kind of embedded database like sqlite, but completely unstructured like JSON.

Also I'm working on an extension that makes it mutable using append-only persistent data structures with a fixed-block caching level that is actually a pretty good database.

if you data is lots and lots of arrays of floats, this is likely not the format for you. Use float arrays.

Also note it stores decimal in a very compact encoding (two varints for base and power of 10)

That said, while this is a text format, it is also technically binary safe and could be extended with a new type tag to contain binary data if desired.

and with little data (i.e. <10Mb), this matters much less than accessibility and easy understanding of the data using a simple text editor or jq in the terminal + some filters.
what do you mean by little data, most communication protocols are not one off
Also good luck parsing 10 MiB of JSON in a loop that can't tolerate blocking the CPU for more than 10ms.

What's expensive is very relative to the use case.