| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jmcminis 3116 days ago
	As it says in the paper, this might be useful for data warehouses. But, it’s not coming to postgres anytime soon. Index updates on the order of seconds to minutes would be too much for a transactional db. There is also the cold start problem. How do you start to lay out the data on disk as you begin inserting it? Do you have a pre-trained net and use it at first (inserting where the net thinks the data should be)? The strategy probably differs by index type.

3 comments

thesz 3116 days ago

Most of current storage backends have Log-structured Merge Tree implementation or something like that.

The larger layers of LSMT have enormous size and should be accessed/built as rare as possible.

Being able to predict that given element exists in the larger layers at all is quite a bonus. You can skip reading megabytes of data.

The rareness of building of the larger layers justifies training deep neural model for them.

I cannot verify existence of LSMT backend for major SQL DB engines, but NoSQL engines use it a plenty: https://en.wikipedia.org/wiki/Log-structured_merge-tree

link

jmcminis 3116 days ago

So you want a LSM for inserts and the DNN for reads? Seems OK. You still have to update/retrain the DNN after an insert into a larger layer, which will be expensive. So you’d probably get high latency at the 99% (or some high number).

link

thesz 3116 days ago

There are no inserts into larger layers, only merges. Which are long (usually processed in background by separate thread) and that longness justifies training a new net in parallel to merge process.

link

wolf550e 3116 days ago

Rocksdb is lstm for an SQL engine

link

brightball 3116 days ago

Well, Postgres does already have something pretty close in Dexter. Made possible due to the Hypothetical Indexes extension. Dexter can either automatically create indexes using concurrent index creation or it can build a list that you can load at your convenience.

I'm just waiting for it to make it into one of the big PG providers.

https://medium.com/@ankane/introducing-dexter-the-automatic-...

link

bretthoerner 3116 days ago

> it’s not coming to postgres anytime soon

Isn't Peloton close? http://pelotondb.io/

link