| The transformation is performed at the ToroDB layer. Each document is analyzed and several steps are performed: - Document is received in BSON format (as per the MongoDB wire protocol) and transformed into a KVDocument. KVDocument is an internal implementation, an abstraction of the concrete representation of a JSON document (i.e., hierarchical, nested sets of key-pairs). - Then, KVDocuments are split by levels (called sub-documents). - Each subdocument is further split into a subdocument type and the data. The subdocument type is basically an ordered set of the data types of that subdocument. - Subdocuments are matched 1:1 to tables. If there is an existing table for the given subdocument type, the document is stored directly there. If there isn't, a new table with that type is directly created. This means that there is also a 1:1 mapping between the attribute names (columns) and key names, and makes it very readable from a SQL user perspective. - There is a table called structure that is basically a representation of the JSON objetct but without the (scalar) data. Think of the JSON object but only the braces and square brackets, plus all the keys (or entries in arrays) that contain objects. There is, per level, a key in this structure that cointains the name of the table where the data for this object is stored. This table uses a jsonb field to store this structure, but note that there's no actual data in this jsonb field. - There's finally a root table which matches structure with the current document. This is used as structures are frequently re-used for many documents. This is in part one of the biggest factors which contributes to significantly reduce the storage required compared to, for example, MongoDB, as the "common" information of that "types of documents" is stored only once. This information and more will be shortly added to the project's wiki. However, it's very easy to see if you run ToroDB and look at the created tables :) Note: I'm one of the authors of ToroDB |