Hacker News new | ask | show | jobs
by MalcolmDiggs 3696 days ago
I believe you're describing what is typically referred to as an "append only" datastore.

The biggest-name in the game is Google BigQuery. From their docs:

"BigQuery tables are append-only. The query language does not currently support either updating or deleting data. In order to update or delete data, you must delete the table, then recreate the table with new data. Alternatively, you could write a query that modifies the data and specify a new results table."

There are others databases, like Datomic, which are less popular. Typical use-cases are usually log-storage, so search around for databases meant for logging, and I'm sure you'll find a lot more.

2 comments

Thanks for your suggestion. The fact that data written to BigQuery can still be modified is not so great for my case. I want to be able to make data immutable forever. Nobody should be able to modify or delete it. Malicious alterations should not affect other nodes holding replicated data.
I remember watching a talk about apache Samza [1] that tries to envision a database model that would be truly append only. It is based on Apache Kafka, so it would satisfy your "distributed" and "immutable forever" requirements.

Talk was really interesting, I haven't used it yet, so I am not sure how mature for use as the canonical data-store it is.

As my college professor would put it "I don't thing these have found their Ulman yet." and if you look at companies using it now [2] it seems mostly stream-processing/data-analytics.

Another thing is, I really don't know how Kafka handles checking of the data authenticity, because in my mind there is not that big of a difference between malicious alteration and malicious append.

Because if you then use something like CRDT [3], or DDD style agregates [4] on top of your immutable data , your end users would still see their view on data mutate.

The thing the immutability would mostly give you is log of all the changes and simple way to restore it. And most mutable databases give you that capability as well.

[1] http://www.confluent.io/blog/turning-the-database-inside-out... [2] https://cwiki.apache.org/confluence/display/SAMZA/Powered+By [3] https://en.wikipedia.org/wiki/Conflict-free_replicated_data_... [4] https://en.wikipedia.org/wiki/Domain-driven_design#Building_...

BigQuery is great for read and analytics. It's one of the best products I have used. But, it may not suit high frequency inserts (not a transactional DB). Also, the insert only "limitation" may not exist in the future.