Hacker News new | ask | show | jobs
by akarve 2456 days ago
Not quite ;) S3 is the primary data and metadata store, so that the rest of the stack is a pure function of S3 data (including Elastic). We don't use git at all yet. We use S3 object versioning and then capture the version, SHA-256, etag, etc. in a JSONL-based manifest https://open.quiltdata.com/b/quilt-example/tree/.quilt/packa.... Said JSONL manifest is simply a "locked list" of all the S3 objects in that package. The same manifests can be checked into git for fork/merge of data sets, but we're still exploring the right way to do that.

I'll let Kevin answer the database fragments question.

1 comments

Neat. But would this not build dependency on s3s versioning and make it hard for getting this portable across other clouds?
Not quite. Abstraction layers like min.io support versioning. More importantly, Quilt manifests only require a "fully qualified physical key" that points to the data. In theory, the manifest can work with any URI: S3, local disk, etc.