Hacker News new | ask | show | jobs
by ozgooen 3822 days ago
Good catch!

Right now the main distribution types are normal and uniform. In the video, I showed normal distributions, which have long tails in both directions.

In this case, a normal distribution isn't really correct, because, as you noted, being less than 0 is exceedingly unlikely.

I believe the correct way to deal with this is to use a lognormal distribution or something that has 0 chance of being less than 0. I don't yet have a simple way of doing this, but it's definitely on the agenda.

2 comments

You should be careful about committing too much to particular distributions being the "right" ones. For example, what if I know some variable is an odd integer between 1 and 11 inclusive with no support on any other real number?

Just something to keep in mind when abstracting what a distribution is.

edit: though as a short-hand for entry, Gaussian is usually a pretty good guess. Is there support for µ±σ instead of [low,high] in the works? Or support for numerical distributions?

"Is there support for µ±σ instead of [low,high] in the works? " - There used to be. I'll be considering ways of adding it back.

"Or support for numerical distributions?" - By numerical distributions do you mean discreet distributions: like, a 40% of being '8' and a 60% chance of being '6'? If so, the answer is no. However, if you use the ternary operator it is possible to do very simple versions of this now. We do support totally random picks of different numbers though, using the pickRandom([3,5,3]) function. http://mathjs.org/docs/reference/functions/pickRandom.html

I meant more along the lines of a user-entered histogram. But that's roughly the same as what you're talking about. It does seem that such a thing must roughly correspond to some internal portion of Guesstimate, anyway. So for an advanced user to punch a distribution in would be handy. May be out-of-scope for this project? I guess really I'm looking for a way to error propagate my home-grown datasets :)
I should add that the negative portion of the answer may not be from sampling the small tail of the Video Length distribution, but is much more likely to be an artifact of how you calculate uncertainty. It might be better to find the median and go out some % in each direction asymmetrically. You can see that the Total Time is roughly flat and then peters out.
Actually, the negative portion could be from sampling the much more substantial negative tail of the Viewers distribution. Either way, constraints seem to be important!