Hacker News new | ask | show | jobs
by kortex 979 days ago
My surface level take is they are similar to M-estimators. Whereas M-estimators are more mathematically rigorous, Windsorized metrics might be easier to compute manually.

It does feel like it's a very early 20th century technique. Nowadays we have so many tools which would be less feasible for calculators (the people) and more feasible for software.

https://en.m.wikipedia.org/wiki/M-estimator

1 comments

I feel stasticians and econometricians tend to take the mean of the log of the distribution.

Recently, we started using the arcsinh instead of the log as well because the function has nice properties[1]

1. https://worthwhile.typepad.com/worthwhile_canadian_initi/201...

The reason why the arsinh transformation is useful (and this is not mentioned in the link you posted) is that it is the optimal variance-stabilizing transformation [1] under the assumption that your data is contaminated by a mixture of additive and multiplicative noise (the same way that the log transformation is the optimal variance-stabilizing transformation when your data is contaminated only by multiplicative noise).

Read the Wikipedia article for a more formal explanation.

[1] https://en.m.wikipedia.org/wiki/Variance-stabilizing_transfo...

Is taking logs (or arcsinh or whatever) really all that good an idea if (a) you don't have a good physical model justifying it or (b) your data spans several orders of magnitude?
Yes in general.

It makes nonlinear relationships linear. Makes the model less sensitive, too. For instance if the data spans several OoM, adding or removing one datapoint in one of those orders can generate a lot of skew before the log-linearization.

It's easy to cast the log back to the original distribution by taking the exponent afterwards.

As far as I understand directly transforming your data can lead to problems. In any case, its what link functions do better in generalized linear models[1].

[1] https://en.m.wikipedia.org/wiki/Generalized_linear_model