| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cmrdporcupine 1029 days ago

One ends up with a 3-tier access hierarchy for accessing a given page:

Present in buffer pool? -> Present on local disk? -> Retrieve from S3/Azure/GCP.

The challenge becomes optimizing this -- speculatively pulling pages in, background evictions, etc.

Garbage collecting old pages also turns out to be complicated. Doing a full trace for expired versions in secondary storage on disk is slow but conceivable. Doing it across petabytes in the cloud, with all the problematic latencies and reliability issues that come with network access... limits the approaches you can take.

They are not new problems -- DBMS development has always been about juggling the trade-offs in performance of different lvels in the memory hierarchy. But it permits higher scale.