| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by VHRanger 979 days ago

I feel stasticians and econometricians tend to take the mean of the log of the distribution.

Recently, we started using the arcsinh instead of the log as well because the function has nice properties[1]

1. https://worthwhile.typepad.com/worthwhile_canadian_initi/201...

2 comments

fjkdlsjflkds 979 days ago

The reason why the arsinh transformation is useful (and this is not mentioned in the link you posted) is that it is the optimal variance-stabilizing transformation [1] under the assumption that your data is contaminated by a mixture of additive and multiplicative noise (the same way that the log transformation is the optimal variance-stabilizing transformation when your data is contaminated only by multiplicative noise).

Read the Wikipedia article for a more formal explanation.

[1] https://en.m.wikipedia.org/wiki/Variance-stabilizing_transfo...

link

bigbillheck 979 days ago

Is taking logs (or arcsinh or whatever) really all that good an idea if (a) you don't have a good physical model justifying it or (b) your data spans several orders of magnitude?

link

VHRanger 979 days ago

Yes in general.

It makes nonlinear relationships linear. Makes the model less sensitive, too. For instance if the data spans several OoM, adding or removing one datapoint in one of those orders can generate a lot of skew before the log-linearization.

It's easy to cast the log back to the original distribution by taking the exponent afterwards.

link

SubiculumCode 979 days ago

As far as I understand directly transforming your data can lead to problems. In any case, its what link functions do better in generalized linear models[1].

[1] https://en.m.wikipedia.org/wiki/Generalized_linear_model

link