Hacker News new | ask | show | jobs
by nl 3109 days ago
That "Learned Index Structures" makes it pretty clear that Karpathy was right in his widely criticized "Software 2.0" piece.
4 comments

I haven't read that paper (Learned Index Structures), but things like gperf have existed for decades. Are these enhanced data structures dynamic, i.e. unlike gperf which is a static one, does it reoptimize as you insert new elements?

In the case of the hash table, I assume it's using the model to compute the hash function.

No, it doesn't handle inserts. On the other hand, the paper writes:

"An ... approach to handling inserts is to build a delta-index. All inserts are kept in buffer and from time to time merged with a potential retraining of the model."

Under some assumptions it does handle inserts. From [1]: Finally, assume that the inserts follow roughly a similar pattern as the learned CDF; [...] Under these assumptions the model might not need to be retrained at all.

[1] https://www.arxiv-vanity.com/papers/1712.01208v1/

This is not like gperf.
I think it is a bit like gperf. What do you consider the big difference?
I thought it was a stretch when reading that medium post. Now reading this and thinking about it after finishing two undergraduate classes one on operating systems and another on compilers, the machine learning for systems part makes a lot of sense, apart from the heuristics the learned index structures idea is just fascinating.

Another illuminating sentence from the paper was this:

>This leads to an interesting observation: a model which predicts the position given a key inside a sorted array effectively approximates the cumulative distribution function (CDF). We can model the CDF of the data to predict the position as: p = F(Key) ∗ N

Maybe it's just my very limited knowledge as an undergrad but I'm feeling that this can be the start of something big. Another idea that just came to me after is how much of this ML is applicable to the domain of cryptography. In my security class it seemed like much of the famous hash functions for example were somehow "found" in vast space of potential schemes.

I think you got the point though. This is going to be HUGE. But why the downvotes? Job security concern?
It could be. CS and more generally tech related fields have this positive feedback loop where the technology facilitates it's own development.

For example:

The easiest and most abundant thing to learn on the web is unsurprisingly web development.

An ML engineer can use ML to optimize the data structures that he uses for his models.

I could not say the same about fields like biology or physics.

Yeah, it is funny that ML, which is biggest fears driven by the media as the ultimate job destroyers, would put this generation of programmers in danger as well.

However, the paradigm shift is inevitable, once discovered, people will use it, and use it anywhere possible.

Judging by the downvotes some people really don't like that idea.