Hacker News new | ask | show | jobs
by devj 2715 days ago
Few doubts:

1. Any reason to write it in Java instead of C, C++, Rust, etc?

2. Any reason to use Protobuf instead of Flatbuffers, Avro, etc?

3. Can FoundationdDB be used with Apache Arrow?

1 comments

The Record Layer is written in Java as it was designed to fit in with an existing stack that was already primarily Java-based. You can read more about how CloudKit uses the Record Layer in the preprint of the Record Layer paper: https://www.foundationdb.org/files/record-layer-paper.pdf

Excellent question regarding the choice to use Protocol Buffers. Firstly, as mentioned in the paper released last year, CloudKit uses Protocol Buffers for client-server intercommunication. As a result, there was already expertise around protobuf, which is a good tie breaker when evaluating alternatives. (Here's that paper, by the way: http://www.vldb.org/pvldb/vol11/p540-shraer.pdf) Secondly, the Record Layer makes heavy use of Protocol Buffer descriptors, which specify the field types and names within protobuf schemata, and dynamic messages. Descriptors are used internally within the Record Layer to do things like schema validation. (For example, if an index is defined on a specific field, the descriptor can be checked to validate that that field exists in the given record type.) Likewise, dynamic messages make it possible for applications using the Record Layer to load their schema at run time by reading it from storage. The FDBMetaDataStore allows the user to do exactly that (while storing the schema persistently in FoundationDB): https://static.javadoc.io/org.foundationdb/fdb-record-layer-...

The Record Layer's data format is not compatible with the specification specified by Apache Arrow, no.

Thanks for your reply. Would be really helpful if you can share the following:

1. Size of the CloudKit cluster and the number of RecordLayer instances. A ratio would also be enough to get an approx. idea.

2. How metadata changes involving field data type are being handled?

3. How are relationships and therefore, foreign keys handled? Are any referential actions like cascading deletes supported?

The Record Layer doesn't currently support foreign key constraints, so foreign keys are more of an “design pattern” than a first-class feature. For example, in a sample schema in the repository, an “Order” message has have a field called “item_id” that points to the primary key of an “Item” message: https://github.com/FoundationDB/fdb-record-layer/blob/792c95... There isn't an automatic check to make sure the item exists, though, nor are there cascading deletes. That being said, I don't think the architecture is incompatible with that feature, so it would be a reasonable feature request.

There are some guidelines regarding field type changes in the schema evolution guide: https://foundationdb.github.io/fdb-record-layer/SchemaEvolut... Most data type changes are incompatible with either Protobuf's serialization format or the FDB Tuple layer's serialization format (which the Record Layer users for storing secondary indexes and primary keys). The general advice for type changes (if there are existing data in your record stores) would instead be to introduce a new field of the new type and deprecate the old one.