Hacker News new | ask | show | jobs
by betaby 1034 days ago
> For instance, blob storages such as S3 have enabled cloud database providers to offer flexible, unlimited storage (SingleStoreDB even coined the term “bottomless storage” for this).

Can someone please elaborate that? What does it mean in conjunction of S3 and DB. I know how traditional DBs work (PostgreSQL and MySQL). I know how S3 work (opensource implementation like minio). But S3 is not a random access file on block storage which is a prerequirement for PostgreSQL and MySQL. How is that solved for S3 based DBs? Can someone point out to the doc, or even better an opensource implementation.

5 comments

Its a popular design for SQL Data warehouses. I think almost all of them (snowflake, redshift, etc.) store cold data in S3 and hot data on local disk[1][2].

It works well if the data is stored as immutable files (i.e., A log structure merge tree) or is not index at all (classical columnstores). S3 doesn't provide an efficient way to update a file.

[1] https://dl.acm.org/doi/10.1145/2882903.2903741 (snowflake SIGMOD paper) [2] https://dl.acm.org/doi/10.1145/3514221.3526055 (singlestore SIGMOD paper)

ClickHouse has two modes of operation on S3:

1. S3 as main storage with a write-through cache. 2. S3 as a cold tier in tiered storage.

It works well because the data is organized by a set of immutable parts called MergeTree. These data parts are atomically created, merged, and deleted, but never modified.

S3 does not work well with random access... but neither it's needed.

Both BigTable and Spanner use Colossus for storage. Which are for OLTP and random access.
One ends up with a 3-tier access hierarchy for accessing a given page:

Present in buffer pool? -> Present on local disk? -> Retrieve from S3/Azure/GCP.

The challenge becomes optimizing this -- speculatively pulling pages in, background evictions, etc.

Garbage collecting old pages also turns out to be complicated. Doing a full trace for expired versions in secondary storage on disk is slow but conceivable. Doing it across petabytes in the cloud, with all the problematic latencies and reliability issues that come with network access... limits the approaches you can take.

They are not new problems -- DBMS development has always been about juggling the trade-offs in performance of different lvels in the memory hierarchy. But it permits higher scale.

S3 is a key-value store with random access and various pricing / limits. If you can make the disk IO part of your DB map to S3 API there is nothing stopping you other than network latency, and ensuring the way you use S3 is cost effective.
You usually have to redesign the storage from the ground up around S3. Some databases do transparent data tiering with S3 which can work ok too depending on the use case.