Hacker News new | ask | show | jobs
by graycat 2999 days ago
Okay, here's a view of what appears to be part of the course:

We have a course (right a school application of stuff taught in school!) with two teachers, that is, two sections of the course, each section with its own teacher and its own students. At the end of the two courses, that is, the two sections, we want to compare the teachers. So we give the same test to all of the students from both courses.

Suppose one section had 20 students and the other one, 25 -- the point here is that we don't ask that the two numbers be equal; fine if they are equal, but we're not asking that they be.

So, there were 45 students. So, get a good random number generator and pick 20 students from the 45 and average their scores; also average the scores of the other 25; then take the difference of the two averages.

That was once. It was resampling. Now, do that 1000 times -- remember, we have a computer to do this for us. So, now we have 1000 differences. If you want, then, "live a little" and do that 2000 times. Or, for A students, do all the combinations of 45 students taken 20 at a time. Ah, heck, lets stick closer to being practical and stay with the 1000.

Now, presto, bingo, drum roll please, may I have the envelope with the actual difference in the actual averages of the actual scores in the two classes.

If that actual difference is out in a tail of the empirical distribution of the 1000 differences from the resamplings, then we have a choice to make:

(1) The two teachers did equally well but just by chance in the luck of the draw of the students one of the teachers seemed to do much better than the other one.

(2) The actual difference is so far out in the tail that we don't believe that the two teachers were equally good, reject the hypothesis that there was no difference, called the null hypothesis, and conclude that the teacher with the higher actual average was actually a better teacher.

Sure, it happened that the real reason was that one section of the course started at 7 AM and was over before the sun came up and the other section was at 11 AM when nearly everyone was awake. We like to f'get about such details! Or, sure, we might get criticized for a poorly controlled experiment.

This is also called a statistical hypothesis test or a two sample test. It is a distribution free test because we are making no assumptions about probability distributions of the student scores, etc. Since we are not assuming a probability distribution, we are not assuming a probability distribution with parameters and, thus, have a non-parametric test. Uh, an example of a probability distribution with parameters is the Gaussian where the parameters are mean and standard deviation.

Such tests go way back in statistics for the social sciences, e.g., educational statistics.

In more recent years, leaders in resampling include B. Efron and P. Diaconis, recently both at Stanford.

Why teach such stuff? Well, some parts of computer science are tweaking old multivariate statistics, especially regression analysis, and calling the results machine learning and/or artificial intelligence, putting out a lot of hype and getting a lot of attention, publicity, students, and maybe consulting gigs. Also the newsies get another source of shocking headlines to get eyeballs for the ad revenue -- write about AI and the old "take over the world ploy"!

So, maybe now some profs of applied statistics, what for a while was called mathematical sciences, etc., or other profs of applied math want to get in on the party. Maybe.

What can be done with resampling tests? I don't know that there is any significant market for such: Long ago I generalized such things to a curious multidimensional case and published the results in Information Sciences. The work was a big improvement on what we were doing in AI at IBM's Watson lab for zero day monitoring of high end server farms and networks. Still, I doubt that my paper has ever been applied.

One of the best areas for applied statistics is the testing of medical drugs. Maybe at times resampling plans have been useful there.

I have a conjecture that resampling plans are closely tied to the now classic result in mathematical statistics that order statistics are always sufficient statistics. Sufficient statistics is cute stuff, from the Radon-Nikodym theorem in measure theory and, in particular, from a 1940s paper of Halmos and Savage, then both at the University of Chicago. Some of the interest is that sample mean and sample variance are sufficient for Gaussian distributed data, and that means that, given such data, you can always do just as well in statistics with only the sample mean and sample variance and otherwise just throw away the data. IIRC E. Dynkin, student of Kolmogorov and Gel'fand, long at Cornell, has a paper that this result for the Gaussian is in a sense unstable: If the distribution is only approximately Gaussian, then the sufficiency claim does not hold.

Other applications of resampling, such applied math, etc. might be in US national security. E.g., maybe monitoring activities in North Korea and looking for significant changes ....

Maybe there would be applications in A/B testing in ad targeting, but I wouldn't hold my breath looking for a job offer to do such from a big ad firm.

For all I know, some Wall Street hedge fund or some Chicago commodities fund uses such statistics to look for significant changes in the markets or anomalies that might be exploited. I doubt it, but maybe! Once I showed my work in anomaly detection to some people at Morgan Stanley, back before the 2008 crash of The Big Short, and there was some interest for monitoring their many Sun workstations but no interest for trading!

Net, IMHO for such applied math: If can find a serious application, that is, a serious problem where such applied math gives a powerful, valuable solution, the first good or much better solution, with a good barrier to entry, and cheap, fast, and easy to bring on-line and monetize, then be a company founder and go for it. But I wouldn't look for venture funding for such a project before had revenue significant and growing rapidly and no longer needed equity funding!

Otherwise look for job offers (1) in US national security, (2) medical research, (3) wherever else. But don't hold breath while waiting.

Now you may just have gotten enough from about 1/3rd of the Berkeley course!

6 comments

What you are describing is known as bootstrapping (if sampling with replacement) jackknifing (if sampling without replacement), or (in the case you want to run a significance test, and not simply create a distribution or stats like confidence intervals) a permutation test. I think you already know that; I'm just mentioning in case others want to look these up by name. Also while they can be called 'distribution free' it only means you are not assuming a prefab distribution. If you want to perform a significance test you'll be creating (explicitly or implicitly) a distribution of your calculated statistic (known as the empirical distribution). If you want to be very explicit about this, you can plot a PDF or CDF of your sampled stats just like you could with a gaussian, exponential, poisson, etc., distribution.

We teach these methods to our students in intro stats at UC San Diego. Have been for as long as I've been here (5 years). Last year a data science program was also created here at UCSD. I've TA'd a flagship course in that program too. It's almost exactly the same content; the major difference is imo are the faculty personalities. The stats profs are smug, while the data science profs are energetically self-important. They teach the same shit. Self motivated students with a STEMy personality tend to learn more in the stats courses because the profs drive on hard core theory; on average though, students do better in the data science course because the profs are so bombastic the kids walk out of each class thinking they are basically ready to join the fellas over at Waymo on some machine learning projects - maybe even show 'em a thing or two, cutting edge tricks learned back at the ol' uni.

Nice!

Yup. Thanks.

> known as the empirical distribution

Yup, and I wrote:

"out in a tail of the empirical distribution"

Yup, "rank" tests, "permutation" tests: With my TeX markup:

E.\ L.\ Lehmann, {\it Nonparametrics: Statistical Methods Based on Ranks,\/}

And, yup, again with my TeX markup,

Bradley Efron, {\it The Jackknife, the Bootstrap, and Other Resampling Plans,\/}

Last time I knew, Roger Wets was at UCSD. He read one of my papers and suggested JOTA where I did publish it!

Whg is stats full of goofy names to make everything sound more unique and complex than it is?
There's a lot of value in your posts. Mathematizing problems, when successful, brings elegant solutions with well understood properties. Hence, I don't understand the downvotes you are usually getting.

I'm a pure CS / logician by training, but I've spent a few years trying to expand my expertise into probability theory and stochastic processes. Lots of your advice resonates with me. My MSc advisor recommended I should go through Neveu. He was pretty good, had been a student of Pontryagin.

In most academic fields, work that mathematizes the field is regarded as the best.

Neveu is elegant beyond belief. I keep my copy close. I was aimed at Neveu by a star student of E. Cinlar, long at Princeton and before that at Northwestern -- long editor in chief of Mathematics of Operations Research. Neveu was a student of M. Loeve at Berkeley. So was the current darling of machine learning, L. Breiman, because of his Classification and Regression Trees (CART). Breiman's Probability as published by SIAM is generally easier reading than Neveu.

For stochastic processes, there are several relatively different directions to go.

Martingale theory is gorgeous, astounding, amazing, with one of the most powerful inequalities in math, the astounding, tough to believe, martingale convergence theorem, and likely the shortest proof of the strong law of large numbers.

Then can do Markov processes more generally. The discrete state space version is important and not too difficult -- Cinlar has a nice introductory text.

A high end direction for Markov processes is potential theory. There are claims that that is the math for exotic options on Wall Street, but I doubt that there have ever been any applications.

There is a big role for second order stationary stochastic processes in electronic engineering. I ran into that for processing ocean wave data for the US Navy. Here the fast Fourier transform added a lot of interest.

And there's more.

Generally long Russia, France, and Japan seemed to have emphasized stochastic processes more than the US. But by now I suspect that the US is well caught up.

I'd have a tough time believing that very many people with money to hire know enough about high end stochastic processes, or even just Neveu, to hire in those fields. US national security may be about the only hope, that is, outside of academics.

Yes it appears that some of the quantum field theorists in physics are interested in path integrals.

Uh, I'm disorganized here: There is the field of stochastic optimal control!

As usual for advanced applied math, my suggestion is, outside of academics or US national security, find a valuable application and start a business to make money. That is, don't expect to be hired.

Did Neveu ever take a detour into physics? There is a well-known QFT model with his name on it.

Incidentally, I have a copy of Loeve's old probability textbook - I wonder if it is too out of date to be any use.

Loeve is terrific. The stuff there hasn't changed or been much improved on. Some people regard Loeve's book as French that sounds like English or English written as French. But Loeve students Neveu and Breiman are easier starts.

Basically we're talking graduate probability based heavily on measure theory and some on functional analysis.

Last I heard of Neveu he was back in France and at least in part working on probability for stock markets.

There are other more recent authors. It seems that a big fraction of the best researchers sometimes take out a year and write yet another book comparable with Neveu. Personally, I did enough with Loeve, Neveu, Breiman, and Chung and for more attention to such material want to use that background to move on. For now, I'm concentrating on my startup; it's based one some applied math I derived, but I've done that and programmed it. Tonight I got four hard disks installed in my first server. Three of them are partitioned and formatted, and as I type this the fourth is being partitioned and formatted.

If you're referring to Jacques Neveu, his Wikipedia page lists him as having died in May 2016 [1].

[1] https://en.wikipedia.org/wiki/Jacques_Neveu

Thanks for the recommendation on Neveu, I’m going to check it out. If I recall correctly Chung mentioned it in his book as well.
Yes, K. L. Chung's book is also competitive.
you could make that argument even for basic applied math, ... don't expect to be hired ... because the people hiring don't know why they are hiring or what to look for
I loved your post man. I think you are right about the rebranding of Applied Stat as ML|AI. I love stuff like these. I did take 3 Stat course in university. Currently I am working as a dev.I took a course in my free time http://codingthematrix.com/ and loved the programming part of it. Do you happen to know some courses where one would have stat part as well as the programming par?
For this statistics and applied math, at least anywhere near the level of the Berkeley course in the OP, it's by now old stuff, older than nearly all living programmers! Well from various subroutine libraries, some open source, some from, IIRC, the US National Bureau of Standards and Technology, SPSS (Statistical Package for the Social Sciences), SAS (Statistical Analysis System), R, Matlab, Mathematica, LINPACK, CART (Classification and Regression Trees, by L. Breiman and others), and more, there's a LOT of code from quite good up to highly polished. Mostly now people use such code instead of writing it. For stochastic processes, there's code, e.g., the fast Fourier transform for which there is a huge pile of code, for all the different flavors of that curious algorithm.

Well, there is more code to write, but IMHO that would be for relatively advanced techniques or, say, working with terabytes of data instead of megabytes.

If you want to write code for applied statistics, then maybe so indicate, have a portfolio of code, and contact the usual suspects -- US national security and medical research. I'm not optimistic. I've given my opinion -- find a good application and found a startup to monetize it.

It is true that today there is a WSJ article on how technical, with algorithms for trading, Wall Street has become. The article has next to nothing on what applied math is being used but does have lots of names, maybe some you could contact. Actually, the article mentions that Goldman Sachs (GS) got hot on such applied math. Well, that was about when I wrote Fisher Black, of Black-Scholes, there at GS asking about applied math at GS, and I got back a nice letter from Black saying that he saw no such opportunities. Well, the WSJ article today claims that that time was when GS was getting hot on applied math.

If you want to know about applied math on Wall Street, then try to get an opinion or overview from, say, James Simons.

Again, IMHO, it's academics, US national security, medical research, maybe a few other situations, but best of all, start a business, the money making kind.

At my previous job we used some similar techniques in (1) social media monitoring and in (2) cyber security applications. In (1) we had an applied math team working on it, and in (2) I handed the project over to someone doing a math PhD.

To be fair, resampling wasn't the key to our projects, but we were doing a lot of work understanding probability distributions which is not entirely unrelated.

I love this. Thanks a lot for taking the time to write it!
Can you or someone else TL;DR that for us ADHD folks?
What I wrote is much shorter than the Berkeley course.

The idea of resampling is just dirt simple; read it again, just the first paragraphs, and you'll get it.