|
|
|
|
|
by blueflow
727 days ago
|
|
Look at this SVG from wikipedia: https://upload.wikimedia.org/wikipedia/commons/1/1a/Boxplot_... When you calculate the box plot using normal distribution parameters, the outliers are outside the outer bracket. If you split the dataset into 4 equal parts, the bracket will be larger because the outliers are still inside it. The methodologies are not equal. This thread is the first time i heard people do the "split dataset into 4 quarters" and using that for box plots. |
|
The SVG you've provided clearly shows that the box plot splits the data in 4. The interquartile range (IQR) is clearly marked and it even has a comparison for what the standard deviation (variance) measure would be.
Secondly, if the data truly came from a normal distribution, there are no outliers. Outliers are data points which cannot be explained by the model and need to be removed. Unless you have a good reason to exclude the data points they should be included. This is why I like the IQR and the median, they are not swayed by a few wide valued data points. The 1.5*IQR rejection filter I think is lazy and unjustified. Happy to discuss this point further as it is a bug bear of mine.