|
|
|
|
|
by derefr
2117 days ago
|
|
As I said in a sibling comment — every version needs to be “hot” / “online” at the same time. The point of this system is to allow for random access to OLAP queries for arbitrary historical versions of the system; and, in fact, to even do time-series reports that perform a given analysis against every available version of the data, hopefully with some degree of parallelism. In matview terms, that means that every version of the data needs to be concurrently materialized. Given just 100M keys (let’s call it a 20GB exported snapshot size), and 1M versions, that’s an overwhelming amount of data — and 99.9999% of it is redundant copies of the same information, i.e. the stuff that didn’t change between versions. Solving the problem of the concurrent materializations requiring petabytes of storage for almost-entirely-redundant heap tuples, is essentially solving the problem of creating a tuple-deduplicating DBMS storage engine — which is equivalent to the problem of building a versioned embedded database :) |
|