| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bonobo3000 4165 days ago

This is a cool idea - the holy grail scenario I'm envisioning is storing all data in the log i.e

1. the transaction log is a central repository for all data 2. much more detailed data is stored, enough that analytics and can run off this same source of data

The amount of data generated increases proportional to the number of updates on a row/piece of data whereas with a mutable solution, it is constant w.r.t number of updates on the same data. That is a pretty big scaling difference.

However, storing that much data translates to much higher costs for HDDs/servers, or possibly lower write performance if the log is stored on something like HDFS.

There would also be performance costs for building and updating a materialized view. Imagine a scenario like this:

Events -> A B C D E F G H I J K Materialized view M has been computed up to item J (but not K yet) Read/Query M

Now either writing K incurs the cost of waiting for all dependent views to materialize, or the read on M incurs the cost of updating M.

Some fusion of this would be pretty interesting though. For example, what if we just query on M without applying any updates if there have been <X updates? That translates to similar guarantees as an eventually consistent DB - the data could be stale. Atleast it gives us more control over this tradeoff.