| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ccleve 1494 days ago

tldr; Don't use relational tables or unstructured document databases. Instead use structured types. The "schema" here is ultimately a collection of independent objects / classes with well-defined fields.

Ok, fine. But I'm not sure how this helps if you have six different systems with six different definitions of a customer, and more importantly, different relationships between customers and other objects like orders or transactions or locations or communications.

I don't see their approach as ground-breaking, but it is definitely worthy of discussion.

2 comments

abraxaz 1494 days ago

> Ok, fine. But I'm not sure how this helps if you have six different systems with six different definitions of a customer, and more importantly, different relationships between customers and other objects like orders or transactions or locations or communications.

If you have this problem, consider giving RDF a look - you can fairly easily use RDF based technologies to map the data in these systems onto a common model, some examples of tools that may be useful here is https://www.w3.org/TR/r2rml/ and https://github.com/ontop/ontop - you can also use JSON-LD to convert most JSON data to RDF. For more info ask in https://gitter.im/linkeddata/chat

link

HelloNurse 1494 days ago

It helps if this machinery can reject data and thus perform validation. Since recursive construction of union types (valid records can look like this, or also like that...) is trivial, a programmer somewhere has to draw the line between "loosen the schema to allow this record" and "reject this record to enforce the schema".

link

mccanne 1494 days ago

Author here. Agreed! Validation is important. While I didn't make this point in the article, our thinking is schema validation does not require that the serialization format utilize schemas as the building block and you can always implementation schema (or type) validation (and versioning) on top of super-structured data (as can also be done with document databases).

link

cmollis 1494 days ago

this is a major hassle when converting from avro (from kafka which uses a schema registry, so schemas are not shipped with the avro data) and storing in parquet which requires a schema in the file but you can 'upgrade' it with another schema when reading it. It would be great to have a binary protocol-like format (schema-less avro), and a schema-less columnar storage format.. which is I guess is what these guys are doing.

link

mccanne 1494 days ago

Hear, hear!

link