Hacker News new | ask | show | jobs
by brad0 1589 days ago
Interesting project! I'm curious about how this compares to other tech:

- How is data stored internally?

- What are the tradeoffs with this implementation?

- Do you generate indexes on data as it comes in?

- How are nested keys handled?

- Are indexes eventually consistent?

1 comments

Thanks!

- How is data stored internally?

During our beta we will be hosting the databases and the underlying table structures are not viewable. However, on release, you will have full access to inspect the tables and indexes as they will be created within your own database.

- What are the tradeoffs with this implementation?

Depends on what you compare it to, but let's assume we are looking at the tradeoffs of SFSQL vs straight SQL tables and indexes. If you needed to store a number of columns in a SQL table and you knew exactly what columns would be queried on (as in filtered on, not just selected), then you would not be able to beat the speed of a composite index on those exact SQL columns. However, as more columns are added to that table, and the demand for more of those columns be queryable at index speeds increases, then the possible combinations of covering indexes grows. In general, more queryable columns end up resulting in more indexes, and if you can't accomodate every possible combination of columns across those indexes, then someone will hit a query which ends up table scanning, and of course more indexes result in slower inserts as well. With SFSQL there is a static number of indexes. It's insert speed and query speed might not be the fastest compared to custom composite indexes. But query speed degrades predictibly as the number of columns queried grows. You might be surprised at the performance given that you can add and query new attributes on the fly without any tables or indexes to create.

- Do you generate indexes on data as it comes in?

No, we don't have to do that. Everything is indexed.

- How are nested keys handled?

Every primitive attribute (non-object) is indexed by it's value and the relation between objects uses oid/poid indexes. Joins are constructed on the fly that make use of the indexes on the primitives and between the objects.

- Are indexes eventually consistent?

The indexes are immediately consistent.

It sounds like you have single-column indexes on every column and then would rely on an intersecting query plan (e.g. bitmap heap scan) if you query multiple columns.

This isn't just slightly slower, it depends a lot on your data how slow this is. It's potentially linear in the length of single-column matches. E.g. a query plan for (region='us' AND product='1234' AND year='2021') could scan the region index for all 'us' tuples, the product index for all '1234' tuples, and the year index for all '2021' tuples. The result might only be 0.0001% of rows, but this might have to scan more index entries than you have rows in the database. In some cases it's probably better to use one index and apply the remaining column filters as you fetch rows from the heap.

If you don't care about performance or only have a few MB of data then maybe you will get away with it, but I've never worked on a project where database performance wasn't an issue.

That WOULD be horrible. But we arent doing that. In essence we found that JOINS using optimized indexes are less costly than full table scans (on unindexed attributes).
And just to be clear. This is NOT a mutually exclusive solution. Just as people use a combination of different databases for different projects and even within the same project, SFSQL can be used alongside other solutions. It is especially convenient to use as an additional datastore when you are already storing your data in a SQL DB because there is no additional devops beyond what was already in place for that SQL DB.
"During our beta we will be hosting the databases and the underlying table structures are not viewable."

A problem here is that during your beta I would imagine potential customers will want to understand exactly how the table structures work - I know that I wouldn't move forward with evaluating this product without understanding what it's actually doing for me.

I would hope that the “what” SFSQL is doing for you is answered on our home page, if not our apologies. We are constantly working on describing the benefits of using SFSQL. “How” we do it is a bit of a moving target for now. The purpose of the beta is to access our implementation and make any needed tweaks. With that said we will provide a better answer to the "How", stay tuned. As previously stated upon the release of our self hosted offering all will be revealed.

*edit typo