|
|
|
|
|
by creer
978 days ago
|
|
Doesn't it depend on what is the likely reason for the outliers? - A world with a different distribution than the one you are trying to fit - A measurement environment subject to bad contacts or noise spikes or experimental mistakes - A reporting system with occasional typos - etc Seems to me what to do with the outliers should be informed by some understanding of the environment. And in some case, noted aside while waiting to see if there is more data "out there" in the outliers' vicinity. In some cases, a replacement for the outlier might be "nearby" while in other cases we know nothing about where the replacement should be. |
|
Like 1002, 998, 1004, 48723, 2104, 1003, 997...
Estimating the deviation from the mean of the last n readings and ignoring ones too far out works well. Also calculate the percentage of bad readings and have a way of displaying it.