| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jjenkov 3791 days ago

You are right, the term "self describing" as used in our docs could be more clear. Being self describing means that you do not need a schema to make sense of a stream of data of that format.

However, there is also a degree to which a data format can be self describing. A CSV file is reasonably self describing because you can see where one field ends and the next begins (at the comma / separator), and where one record ends and the next begins (new line). With a header line of column names a CSV file becomes more self describing, as you now also have a name indicating the semantic meaning of fields in that column. If a CSV file could somehow contain a specification of the data type of each column, it would be even more self describing etc.

This is what we are trying to achieve with ION. If you need speed, you can omit most of the meta data like property names etc. If you need messages to be self describing, you can add a lot of meta data (like class / schema names + version, property names etc.).

I apologize for having written incorrect documentation. If you wrote those docs for Google Protocol Buffers, part of that is on you. They are not exactly crystal clear ;-) (our doc's aren't either - still working on them!)

Thank you for clearing up that Protobuf fields can be distinguished in a stream of Protobuf fields, even without schema. That was unclear to me before now. By the way, that is pretty clear in Cap'n Proto - your invention right? So - better docs already!

And - thank you for clearing up the difference in the encoding of Cap'n Proto. Any link to where I can read about that encoding style in more details?

1 comments

kentonv 3791 days ago

> Any link to where I can read about that encoding style in more details?

Hmm, I'm not aware of any literature other than what's on the Cap'n Proto web site. You can of course find the Cap'n Proto encoding documented here:

https://capnproto.org/encoding.html

The format is, of course, a lot like how in-memory data structures are laid out in C (fields of a struct have fixed offsets; variable-size fields are behind pointers). Unlike native pointers, though, Cap'n Proto's pointers are designed to be relocatable and easy to bounds-check, and they contain just enough type information for the message to be minimally self-describing (so that you can e.g. make a copy of a particular sub-object without knowing its schema).

link