| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nl 3155 days ago
	That "Learned Index Structures" makes it pretty clear that Karpathy was right in his widely criticized "Software 2.0" piece.

4 comments

ekr 3155 days ago

I haven't read that paper (Learned Index Structures), but things like gperf have existed for decades. Are these enhanced data structures dynamic, i.e. unlike gperf which is a static one, does it reoptimize as you insert new elements?

In the case of the hash table, I assume it's using the model to compute the hash function.

link

sanxiyn 3155 days ago

No, it doesn't handle inserts. On the other hand, the paper writes:

"An ... approach to handling inserts is to build a delta-index. All inserts are kept in buffer and from time to time merged with a potential retraining of the model."

link

pg314 3155 days ago

Under some assumptions it does handle inserts. From [1]: Finally, assume that the inserts follow roughly a similar pattern as the learned CDF; [...] Under these assumptions the model might not need to be retrained at all.

[1] https://www.arxiv-vanity.com/papers/1712.01208v1/

link

nl 3155 days ago

This is not like gperf.

link

sanxiyn 3155 days ago

I think it is a bit like gperf. What do you consider the big difference?

link

Nydhal 3155 days ago

I thought it was a stretch when reading that medium post. Now reading this and thinking about it after finishing two undergraduate classes one on operating systems and another on compilers, the machine learning for systems part makes a lot of sense, apart from the heuristics the learned index structures idea is just fascinating.

Another illuminating sentence from the paper was this:

>This leads to an interesting observation: a model which predicts the position given a key inside a sorted array effectively approximates the cumulative distribution function (CDF). We can model the CDF of the data to predict the position as: p = F(Key) ∗ N

Maybe it's just my very limited knowledge as an undergrad but I'm feeling that this can be the start of something big. Another idea that just came to me after is how much of this ML is applicable to the domain of cryptography. In my security class it seemed like much of the famous hash functions for example were somehow "found" in vast space of potential schemes.

link

oh-kumudo 3155 days ago

I think you got the point though. This is going to be HUGE. But why the downvotes? Job security concern?

link

Nydhal 3155 days ago

It could be. CS and more generally tech related fields have this positive feedback loop where the technology facilitates it's own development.

For example:

The easiest and most abundant thing to learn on the web is unsurprisingly web development.

An ML engineer can use ML to optimize the data structures that he uses for his models.

I could not say the same about fields like biology or physics.

link

oh-kumudo 3155 days ago

Yeah, it is funny that ML, which is biggest fears driven by the media as the ultimate job destroyers, would put this generation of programmers in danger as well.

However, the paradigm shift is inevitable, once discovered, people will use it, and use it anywhere possible.

link

nl 3155 days ago

Judging by the downvotes some people really don't like that idea.

link