| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by stavrospap 2234 days ago

Folks, apologies, but I think we got a bit side tracked here, TileDB does not suffer from the consistency issues mentioned above.

Here is how TileDB performs a new (potentially concurrent with other reads and writes) write:

- It creates a fragment folder (or "prefix" of a set of objects on S3 - there are no "folders" on S3) which is timestamped and carries a unique UUID. This fragment is self-contained and represents the entire write (e.g., all cells and all attribute values)

- It writes all data objects under the fragment prefix. Note that TileDB never updates, it always writes new immutable objects.

- After all the PUT requests succeed for the data objects, it creates an empty "ok" object.

Here is how TileDB performs a (potentially concurrent with other reads and writes) read:

- It lists the array prefix to get the ok objects

- There are two cases:

1. The ok object is not there for some fragment. That fragment is completely ignored.

2. The ok object is there. Since TileDB writes the ok object last, all the data objects it wrote have been committed and are all visible with GET requests. TileDB reads the data objects only with GET requests (not ListObject requests). Due to S3’s read-after-write consistency model (https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction...), all those objects will be available for reading (now on all S3 regions) with GET and there will be no errors.

Therefore, TileDB follows the eventual consistency model of S3 without any errors and surprises. The user doesn't need to handle anything. Our customers have been using TileDB in production for a long time, storing hundreds of TBs of data on S3, and no consistency issue has ever come up.

Summarizing, what xyzzy_plugh is raising here is that TileDB does not have ACID guarantees. And that is true (we never claimed the contrary) and intentional. We are building a transactional layer outside of the storage engine. The reason is that this transactional layer indeed needs to be a constantly running distributed service, whereas we want the TileDB storage engine to be embeddable and used without performance regression even by applications that do not need ACID (that is, the majority of our data science applications).