| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by popcorncolonel 3224 days ago
	I disagree that you need a solid founding in information theory. Almost all that I've seen about IT in ML is minimizing the KL divergence, which can be learned by browsing the wiki page.

5 comments

jules 3224 days ago

Well, information theory isn't much more than the logarithm of probability theory, so it doesn't hurt to learn it anyway. The only thing you need to know is that given a probability distribution P there exist a compression scheme to encode a value X with a message of P_length(X) = log(1/P(X)) bits. This can be summarised as BITS = log(1/PROBABILITY). Entropy is just the average number of bits you need to encode a random value from distribution P with the compression scheme of distribution P, i.e. E_P[P_length(X)]. The KL(P,Q) divergence is when you encode a random value from distribution P with the compression scheme of distribution Q. Say you're compressing english text but you're using a compressor tailored to spanish. The KL divergence is how many extra bits you need (on average) compared to encoding the english text with the english compressor:

KL(P,Q) = E_P[Q_length(X)] - E_P[P_length(X)]

link

murbard2 3224 days ago

> information theory isn't much more than the logarithm of probability theory

stealing

link

srean 3224 days ago

It depends. All that is essential for an autombile engineer is not essential for a taxi driver.

link

sgt101 3224 days ago

Maybe more all that is essential for a molecular biologist isn't necessary for a general practitioner? It's just... those conference calls where you're explaining that because the classifier is working really well now doesn't mean that we can use it in production, those calls can get difficult and annoying, and sometimes the "other side" wins - with predictable results.

ha ha ha!

link

srean 3224 days ago

You bring up a very important point and a difficult one which is, if the decision making is in the hands of someone who does not understand the nuances too well nor has the time or inclination, what do you do ?

If your salary is going to depend on how many models you pushed out and not how well they continued to perform, many will optimize over the number of models pushed out.

A major source of problem (and sometimes a gift) is that you cannot prove a empirical statistical claim true or false in finite time. There is always this non-zero probability that the weirdest thing would happen. It could be just sheer bad luck that the model did so poorly in this cycle.

link

eli_gottlieb 3224 days ago

That's not because you need little background in information theory. That's because KL-divergences are such a universal info-theoretic quantity that if you deeply understand them, you understand much to most of information theory.

This is like saying, "You don't need to really know calculus, just integrals."

link

CuriouslyC 3224 days ago

Information theory is pretty central to model selection.

link

Cacti 3224 days ago

Information theory and probability are basically the same thing.

link