Hacker News new | ask | show | jobs
by thesz 3116 days ago
Most of current storage backends have Log-structured Merge Tree implementation or something like that.

The larger layers of LSMT have enormous size and should be accessed/built as rare as possible.

Being able to predict that given element exists in the larger layers at all is quite a bonus. You can skip reading megabytes of data.

The rareness of building of the larger layers justifies training deep neural model for them.

I cannot verify existence of LSMT backend for major SQL DB engines, but NoSQL engines use it a plenty: https://en.wikipedia.org/wiki/Log-structured_merge-tree

2 comments

So you want a LSM for inserts and the DNN for reads? Seems OK. You still have to update/retrain the DNN after an insert into a larger layer, which will be expensive. So you’d probably get high latency at the 99% (or some high number).
There are no inserts into larger layers, only merges. Which are long (usually processed in background by separate thread) and that longness justifies training a new net in parallel to merge process.
Rocksdb is lstm for an SQL engine