Hacker News new | ask | show | jobs
by olliej 839 days ago
I was always surprised at how easily you get biased sampling when generating random points despite all input - something I often saw students do was essentially normalize({2rand() - 1, 2rand() - 1, 2rand() - 1}) or variations of that (where rand() is "good" not the literal rand(3)), and there are numerous other ways that are more subtly wrong. IIRC the nominally correct way for a sphere specifically is something like a=rand() b=rand() and the random point is something* like { cos(a)sin(b), sin(b), cos(a)cos(b) }[1]

I think the best illustration of "reasonable choices of what random values should be used leading to biased results" is Bertrand's paradox which I was introduced to via numberphile/3blue1brown: https://www.youtube.com/watch?v=mZBwsm6B280 and am just glad that nothing I have ever needed random sampling for has ever been important :D

[1] please don't use this blindly, I'm really just going off very old recollection, if you need it google random sphere sampling :D

1 comments

I used to think “normal distributions are everywhere” but the more math and science I watch on YouTube the more the central limit theorem pops up. It’s the CLT that’s everywhere, it just brings normal distribution as it’s +1.
Indeed. The fundamental insight behind CLT i.e. sample average is normally distributed even when population distribution is not normal is intuitive, yet the the theorem is magical.
When you have more than one variable, the outcomes get clumpy.