Hacker News new | ask | show | jobs
by geocar 2781 days ago
When I'm interviewing a database expert, the one that says:

> "You Can Lose a Few Datapoints Here and There"

is not the one I'm going with...

2 comments

There are some properties of the data that can be exploited to add weaker consistency guarantees. This leads to some desirable design trade-offs in terms of simplicity and performance optimisation. While this could result in data loss, it may be permissible given that queries can span large time ranges where one or two missing datapoints do not carry the same weight as a financial miscalculation, or loss of life. The same could be said with multiplayer games played over mobile devices, with intermittent connectivity issues. In this domain, the player's moves are fast forwarded once connectivity is restored, as this provides no observable difference to other players. My point is that it's very dependent on the use case, and does not apply across the board.
There's nothing wrong with a special-purpose tool for building approximate graphs, but calling it a "time-series database" or even quoting "inserts-per-second" is intellectually dishonest.
Many SSDs only write 4kb blocks, and writing a 64bit datapoint uncompressed to disk would not only be slow, but it would result in write amplification and wear out the disk sooner. The solution that many TSDBs, including Prometheus and Influx, involves in-memory batching with a backing WAL log file. If the in-memory batch or WAL log is lost, you would lose data as well.
You shouldn't be the one hiring if you can't talk about different scenarios.