| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bbor 743 days ago

Wow I came into this article angry, idk if their book title accurately conveys the sober, expert analysis it contains! In case anyone else is curious why they’re talking about “leakage” in the first place instead of the existing term “model bias”, here’s the paper they cite in the “compelling evidence” paper that started these two’s saga with the snake oil salesmen: https://www.cs.umb.edu/~ding/history/470_670_fall_2011/paper...

Crux passage:

> Our focus here is on leakage, which is a specific form of illegitimacy that is an intrinsic property of the observational inputs of a model. This form of illegitimacy remains partly abstract, but could be further defined as follows: Let u be some random variable. We say a second random variable v is u-legitimate if v is observable to the client for the purpose of inferring u. In this case we write v € legit{u}.

> A fully concrete meaning of legitimacy is built-in to any specific inference problem. The trivial legitimacy rule, going back to the first example of leakage given in Section 1, is that the target itself must never be used for inference:

> (1) y !€ legit{y}

So ultimately this all about bad experimental discipline re: training and test data, in an abstract way? I’ve been staring at this paper for way too long trying to figure out what exactly each “target” is and how it leaks, but I hope that engineering-translation is close