|
|
|
|
|
by jerf
5348 days ago
|
|
Alternatively, it may be simple information theory: A model that takes in 100 bits of specification simply can not correctly describe a process that has 10,000 bit's worth of degrees of freedom. And that's before we talk about iteration over time, and before we get to the final killer you mention, which is when the models are ruined by their own application to the domain. I think radical underspecification is much more likely than overspecification, really. (Since I encounter this a lot, let me pre-answer one question in advance, which is "What if only 300 bits really matter and the rest don't matter as much?" and the answer is that the term bit in information theory encompasses that idea already. If you have ten "bits", but they tend to be highly correlated together such that they are usually all 0 or all 1, you in fact don't have ten bits in information theory. Ten bits are, by definition, ten fully-independent true or false values. Bits-in-memory are not the same as information-theory-bits. A real system with 10,000 bits can not, pretty much by definition, be modeled by 100 bits. If it could, it would be a system with only 100 bits in the first place. Information theory cares about the true degrees of freedom available, not about your particular representation of the system.) |
|
This article speaks of the separate problem that economic models are not evaluated in any sort of experiments, and thus are prone to overfitting. This makes them unlikely to even approximate well.
Consider a basic multilayer perceptron-style neural network. Overfitting is a well-understood problem in training an MLP. We work around it by training on a part of the data, and then measuring its accuracy on another part -- much as Carter did in his analysis. If the accuracy is poor, something is adjusted: the size of the hidden layer can be increased, the training set expanded, the duration of the training increased or decreased, or the MLP model discarded entirely.
If increase of the training set or reduction of the duration improves accuracy against the test set, this means we had an overfitting problem.