| Based on the downvotes, it seems people think reducing variance is the same as reducing overfitting. Think of the bias/variance tradeoff as a spotlight, and we are shining the spotlight on a bunch of cats, who reflect back the spotlight when their eyes are open. Eyes are open or closed randomly. Cat eyes are either green or brown. We want to know the distribution of cat eyes in parts of the population, which in general is an even 50/50 split. We determine the distribution in a certain location by taking the average of the eyes we see. If variance is large, then the spotlight is very large, and we don't learn anything because we just average the entire population. If the spotlight is small, then we can learn something, but only if there are enough samples in the region we shine the light. So, what if we start with a large spotlight, and then when we see a region with a large number of open eyes of one color, we narrow the light down to just that region? Won't that allow us to avoid overfitting, while maximizing our ability to learn? It unfortunately does not, because with a large enough population that is evenly distributed, there will always be pockets that exhibit what appear to be a pattern, but is just an accident of which cats happened to open their eyes. This scenario of starting with the spotlight large and then zooming into a patterned region is the same as reducing variance with the training data. With a large enough dataset it is always possible to find these accidental patterns and then zoom into them by reducing variance. |