|
|
|
|
|
by Alligaturtle
979 days ago
|
|
I'm not sure I would feel comfortable using the Winsorized mean -- it doesn't have any particular statistical properties, and it lacks any intuition appeal because it's not clear what the value represents. I can understand a line of logic that would give rise to something like the Winsorized mean -- after you look at your data, you see some obvious outliers. It feels dirty to just drop those values (which would lead to the truncated mean) because the information from an implausible value is more likely to be near the extreme than it is to be near the center mass. What to do with those extreme values? Here's something I now want to experiment with -- bootstrapping the extreme values. Take note of the original empirical distribution. Then, create a new distribution by removing the top and bottom X% of the observations and replacing them with values drawn i.i.d. from the original empirical distribution. This could lead to some values being replaced with the outliers that we originally wanted to drop. After we do this, record the mean. Then create new sample distributions until we have a distribution of new means. What I am curious about is how the shape if this distribution of means will be impacted depending on that X% value selected at the beginning. What are some well-known distributions that appear to have outliers? A log-normal distribution maybe? |
|
It does feel like it's a very early 20th century technique. Nowadays we have so many tools which would be less feasible for calculators (the people) and more feasible for software.
https://en.m.wikipedia.org/wiki/M-estimator