| Thanks! - How is data stored internally? During our beta we will be hosting the databases and the underlying table structures are not viewable.
However, on release, you will have full access to inspect the tables and indexes as they will be created within your own database. - What are the tradeoffs with this implementation? Depends on what you compare it to, but let's assume we are looking at the tradeoffs of SFSQL vs straight SQL tables and indexes.
If you needed to store a number of columns in a SQL table and you knew exactly what columns would be queried on (as in filtered on, not just
selected), then you would not be able to beat the speed of a composite index on those exact SQL columns. However, as more columns are
added to that table, and the demand for more of those columns be queryable at index speeds increases, then the possible combinations of
covering indexes grows. In general, more queryable columns end up resulting in more indexes, and if you can't accomodate every possible combination of columns
across those indexes, then someone will hit a query which ends up table scanning, and of course more indexes result in slower inserts as well.
With SFSQL there is a static number of indexes. It's insert speed and query speed might not be the fastest compared to custom composite indexes.
But query speed degrades predictibly as the number of columns queried grows. You might be surprised at the performance given that you can
add and query new attributes on the fly without any tables or indexes to create. - Do you generate indexes on data as it comes in? No, we don't have to do that. Everything is indexed. - How are nested keys handled? Every primitive attribute (non-object) is indexed by it's value and the relation between objects uses oid/poid indexes.
Joins are constructed on the fly that make use of the indexes on the primitives and between the objects. - Are indexes eventually consistent? The indexes are immediately consistent. |
This isn't just slightly slower, it depends a lot on your data how slow this is. It's potentially linear in the length of single-column matches. E.g. a query plan for (region='us' AND product='1234' AND year='2021') could scan the region index for all 'us' tuples, the product index for all '1234' tuples, and the year index for all '2021' tuples. The result might only be 0.0001% of rows, but this might have to scan more index entries than you have rows in the database. In some cases it's probably better to use one index and apply the remaining column filters as you fetch rows from the heap.
If you don't care about performance or only have a few MB of data then maybe you will get away with it, but I've never worked on a project where database performance wasn't an issue.