|
|
|
|
|
by lellotope
2799 days ago
|
|
There's kind of two issues at least. One is the continuous-discrete issue and the other is the moment issue. As for the moment issue, the short story is that as you get into three or four moments, there isn't a general maximum entropy distribution anymore, except for some special idiosyncratic cases in the case of three I think. So the normal is, in some ways, the most conservative distribution you can have in a general, unspecified scenario sense. You can specify more moments, but then there isn't a single maxent distribution you can specify that would apply across all third and fourth-moment scenarios in the same way that would apply for the first two moments. As for the continuous versus discrete thing, there's some caution that's warranted, but a lot of the maxent principles apply, and there are similar, closely related principles (minimum description length, which has been shown to be equivalent to maximum entropy inferentially in a sense) that generalize in the continuous case. If you think of everything as discretized (as is the case with machine representation), there's some work showing that the discretized and continuous cases are sort of related up to a constant (doi: 10.1109/TIT.2004.836702). I realize this is a bit hand-wavy but it is a HN post. |
|
I do see the reasoning for choosing the normal due to it being the only distribution with finite non-zero moments, and thus, as you nicely pointed out, constraints on a finite number of higher order moments will not give a unique distribution.
But, due to the issues we've now mentioned, I find myself a bit uneasy wrt. maxent as a derivation of and/or as an explanation of the ubiquity of the normal distribution. Thus I find myself more comfortable with some of the other derivations demonstrated by Jaynes.
And thank you for the paper reference; will have a proper look at it sometime. It might be related to