Hacker News new | ask | show | jobs
by creationix 97 days ago
Initial format docs are now here:

https://github.com/creationix/rx/blob/main/docs/rx-format.md

Railroad diagrams will come later when I have more time.

1 comments

Neat! In case you took me too literally: railroad diagrams are fun, but far from the only way to give spec level clarity, so don’t feel you need to overindex on my silly comment!

I am curious why it’s parsed right to left. Is this so that you could add new data to a top-level JSONL-esque list, solely by rewriting the end of the data structure, and not needing to change the beginning (or worst-case shift every single byte of data, if you need a longer count)?

It’s an interesting design tradeoff, because you can’t show a partial parse if you’re streaming the content naively beginning to end, which is a bit odd in a world where streams that begin to render token-by-token are all the rage.

But if you have an ability to do range queries, it’s quite effective, and it does allow for those incremental updates!

Tha main reason for the reverse encoding is it makes it easier on the writer. You simply do a depth-first traversal of the data graph and emit data on the way back up the stack. Zero buffering is needed since this naturally means you write contents before the length prefix.

But it does open up a future direction I want to make with mutable datasets using append-only persistent data structures. The chain primitive is currently only used for strings, but it will be used to do the equivalent of `{...oldObj, ...newObj}` as a single chain `(pointerToOldObj, newObj)`.

With chains and pointers, you can write new versions of a dataset and reuse all the existing values that are unchanged. This, combined with random-access reads and fixed-block caching makes for a fairly complete MVCC database.

And don't worry about railroad diagrams. I already intended to create them, I've just been extra busy this week with other things.