|
|
|
|
|
by stevenwu
3311 days ago
|
|
Question to spark discussion and for me to fill potential gaps in my knowledge, not to criticize the article as I very much appreciate the transparency and build-up from the simple naive initial approach to the final approach used in production: Is anyone else bothered by the claim that there "is a 100% chance that the new version is better than the current one" shown by using bootstrap? Maybe I've just never come across such a use of bootstrap through my encounters with statistics. I know it as a tool for resampling from a population to build up properties of your estimator (mean, variance, what have you) when all you have is a dataset and no clue about the actual distribution. When I saw bootstrap with that probabilistic claim, I thought the author would calculate a bootstrapped (100-x)% confidence interval for both the current and the new weights: and if the intervals didn't overlap with one another then you can claim with (100-x)% certainty that one is better than the other. But the author creates a new statistic that is a function of both datasets; Z_i = 1 if new is better than current on iteration i (on a random subset of data) else 0, and for all N=10000 iterations Z_i = 1. The chance/probabilistic claim made of new being better than current is based on the fact that no variation was seen on Z_i (I'm also kind of skeptical that out of so many iterations with random subsets that each time the new weights were better than the current). I think at most you can say that you simulated subsets of the data and 100% of the time new > current; the current claim leads me to believe there's inference that isn't there. Maybe I should just ask one of my past stats profs. Open to someone enlightening me. |
|