| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by graycat 2999 days ago

Okay, here's a view of what appears to be part of the course:

We have a course (right a school application of stuff taught in school!) with two teachers, that is, two sections of the course, each section with its own teacher and its own students. At the end of the two courses, that is, the two sections, we want to compare the teachers. So we give the same test to all of the students from both courses.

Suppose one section had 20 students and the other one, 25 -- the point here is that we don't ask that the two numbers be equal; fine if they are equal, but we're not asking that they be.

So, there were 45 students. So, get a good random number generator and pick 20 students from the 45 and average their scores; also average the scores of the other 25; then take the difference of the two averages.

That was once. It was resampling. Now, do that 1000 times -- remember, we have a computer to do this for us. So, now we have 1000 differences. If you want, then, "live a little" and do that 2000 times. Or, for A students, do all the combinations of 45 students taken 20 at a time. Ah, heck, lets stick closer to being practical and stay with the 1000.

Now, presto, bingo, drum roll please, may I have the envelope with the actual difference in the actual averages of the actual scores in the two classes.

If that actual difference is out in a tail of the empirical distribution of the 1000 differences from the resamplings, then we have a choice to make:

(1) The two teachers did equally well but just by chance in the luck of the draw of the students one of the teachers seemed to do much better than the other one.

(2) The actual difference is so far out in the tail that we don't believe that the two teachers were equally good, reject the hypothesis that there was no difference, called the null hypothesis, and conclude that the teacher with the higher actual average was actually a better teacher.

Sure, it happened that the real reason was that one section of the course started at 7 AM and was over before the sun came up and the other section was at 11 AM when nearly everyone was awake. We like to f'get about such details! Or, sure, we might get criticized for a poorly controlled experiment.

This is also called a statistical hypothesis test or a two sample test. It is a distribution free test because we are making no assumptions about probability distributions of the student scores, etc. Since we are not assuming a probability distribution, we are not assuming a probability distribution with parameters and, thus, have a non-parametric test. Uh, an example of a probability distribution with parameters is the Gaussian where the parameters are mean and standard deviation.

Such tests go way back in statistics for the social sciences, e.g., educational statistics.

In more recent years, leaders in resampling include B. Efron and P. Diaconis, recently both at Stanford.

Why teach such stuff? Well, some parts of computer science are tweaking old multivariate statistics, especially regression analysis, and calling the results machine learning and/or artificial intelligence, putting out a lot of hype and getting a lot of attention, publicity, students, and maybe consulting gigs. Also the newsies get another source of shocking headlines to get eyeballs for the ad revenue -- write about AI and the old "take over the world ploy"!

So, maybe now some profs of applied statistics, what for a while was called mathematical sciences, etc., or other profs of applied math want to get in on the party. Maybe.

What can be done with resampling tests? I don't know that there is any significant market for such: Long ago I generalized such things to a curious multidimensional case and published the results in Information Sciences. The work was a big improvement on what we were doing in AI at IBM's Watson lab for zero day monitoring of high end server farms and networks. Still, I doubt that my paper has ever been applied.

One of the best areas for applied statistics is the testing of medical drugs. Maybe at times resampling plans have been useful there.

I have a conjecture that resampling plans are closely tied to the now classic result in mathematical statistics that order statistics are always sufficient statistics. Sufficient statistics is cute stuff, from the Radon-Nikodym theorem in measure theory and, in particular, from a 1940s paper of Halmos and Savage, then both at the University of Chicago. Some of the interest is that sample mean and sample variance are sufficient for Gaussian distributed data, and that means that, given such data, you can always do just as well in statistics with only the sample mean and sample variance and otherwise just throw away the data. IIRC E. Dynkin, student of Kolmogorov and Gel'fand, long at Cornell, has a paper that this result for the Gaussian is in a sense unstable: If the distribution is only approximately Gaussian, then the sufficiency claim does not hold.

Other applications of resampling, such applied math, etc. might be in US national security. E.g., maybe monitoring activities in North Korea and looking for significant changes ....

Maybe there would be applications in A/B testing in ad targeting, but I wouldn't hold my breath looking for a job offer to do such from a big ad firm.

For all I know, some Wall Street hedge fund or some Chicago commodities fund uses such statistics to look for significant changes in the markets or anomalies that might be exploited. I doubt it, but maybe! Once I showed my work in anomaly detection to some people at Morgan Stanley, back before the 2008 crash of The Big Short, and there was some interest for monitoring their many Sun workstations but no interest for trading!

Net, IMHO for such applied math: If can find a serious application, that is, a serious problem where such applied math gives a powerful, valuable solution, the first good or much better solution, with a good barrier to entry, and cheap, fast, and easy to bring on-line and monetize, then be a company founder and go for it. But I wouldn't look for venture funding for such a project before had revenue significant and growing rapidly and no longer needed equity funding!

Otherwise look for job offers (1) in US national security, (2) medical research, (3) wherever else. But don't hold breath while waiting.

Now you may just have gotten enough from about 1/3rd of the Berkeley course!

6 comments

subroutine 2998 days ago

What you are describing is known as bootstrapping (if sampling with replacement) jackknifing (if sampling without replacement), or (in the case you want to run a significance test, and not simply create a distribution or stats like confidence intervals) a permutation test. I think you already know that; I'm just mentioning in case others want to look these up by name. Also while they can be called 'distribution free' it only means you are not assuming a prefab distribution. If you want to perform a significance test you'll be creating (explicitly or implicitly) a distribution of your calculated statistic (known as the empirical distribution). If you want to be very explicit about this, you can plot a PDF or CDF of your sampled stats just like you could with a gaussian, exponential, poisson, etc., distribution.

We teach these methods to our students in intro stats at UC San Diego. Have been for as long as I've been here (5 years). Last year a data science program was also created here at UCSD. I've TA'd a flagship course in that program too. It's almost exactly the same content; the major difference is imo are the faculty personalities. The stats profs are smug, while the data science profs are energetically self-important. They teach the same shit. Self motivated students with a STEMy personality tend to learn more in the stats courses because the profs drive on hard core theory; on average though, students do better in the data science course because the profs are so bombastic the kids walk out of each class thinking they are basically ready to join the fellas over at Waymo on some machine learning projects - maybe even show 'em a thing or two, cutting edge tricks learned back at the ol' uni.