| Why would you choose a LSM Tree based storage mechanism for a message queue? The only reason I can come up with would be because it's a read-to-use library you can just plug in which gives OK performance and some handy features because you can use the KV store for other things. But it doesn't scale well and backups with LevelDB are not really easy either (close DB, copy all files). Message queues when they are ordered (at least on the local node/queue level) usually just need some kind of append-only log file. You don't do random reads or writes into the middle of the queue, you only modify the head and tail. InfluxDB, albeit being a time series db has similar write patterns to a message queue, learned it the hard way when they first tried to use a LSM Tree database (LevelDB), then switched to a B+Tree (BoltDB/LMDB) but that also doesn't scale once the DB gets big and the tree has quite some depth.
They kindly did a nice writeup of their journey: https://influxdb.com/docs/v0.9/concepts/storage_engine.html Why not do it simple and use append-only files without complex structure and management? Check out Kafka for a better storage format for message queues of this kind. PS: every message queue should first clearly explain what guarantees it provides. |
The R/W patterns for a message queue are simple:
- Messages are key/value
- key is an autoincrementing id
- Writes are at the end, Reads are from the beginning
- Once a message is processed, it's deleted
So in practice this means that the items are written in an append-only fashion, get merged in bigger chunks, and then get progressively deleted. So at higher levels you don't see the huge latencies due to compaction because all records are deleted. Knowing that keys are only incrementing could also lead to a simple optimization: the compaction phase can be a simple concatenation of files.
So you get an append-only system that progressively removes older entries as they are deleted without resorting to mad science hackery [1]. Why didn't it work for InfluxDB ? All I can guess is that individual entries for each series are all mixed together (InfluxDB wants to be able to manage many series with many tags) and older entries are not deleted as frantically, so you get the latencies we all know with compaction and unpredictable reads.
Now, this is purely theoretical and of course further experimentations are needed to make sure this is correct, but LSM is in my opinion a correct pattern here.
[1] https://gist.github.com/CAFxX/571a1558db9a7b393579