Hacker News new | ask | show | jobs
by manthideaal 2309 days ago
Using Kolmogorov Complexity and then using PCA is not valid since you are approximating a solution and Kolmogorov Complexity is for exact solutions. Anyway perhaps there is, or should be defined, a signal/noise Kolmogorov Complexity measure, that is the shortest length of a program that computes an approximate solution within an epsilon distance of the true solution. Also since PCA is discussed, why not use SVD?

Edited: See (1)for some related ideas: A Safe Approximation for Kolmogorov Complexity

(1) https://link.springer.com/chapter/10.1007/978-3-319-11662-4_...

2 comments

> perhaps there is, or should be defined, a signal/noise Kolmogorov Complexity measure

This is studied in so-called algorithmic rate-distortion theory:

Rooij, S. de, & Vitanyi, P. (2012). Approximating Rate-Distortion Graphs of Individual Data: Experiments in Lossy Compression and Denoising. IEEE Transactions on Computers, 61(3), 395–407. https://doi.org/10.1109/TC.2011.25

Vereshchagin, N., & Vitányi, P. (2006). On Algorithmic Rate-Distortion Function. Information Theory, 2006 IEEE International Symposium On, 798–802.

Well stated. The author also misses two other critical points: (1) accuracy is a poor measure of quality for non-numeric, classification type problems, (2) increasing model complexity has an asymptote in order to prevent overfitting. You can’t arbitrarily increase the number of weights and expect that the NN will continue to improve.
The thing about NN is that increasing the weights does improve the performance. The standard way to get good performance and see if your architecture works is just get the network huge (wide). After you see it works you get it small.

The "common wisdom" of "too many parameters will make you overfit" is most definitely not that important for the way modern NN training works.

Overfitting shouldn't be an issue for approximation of a known function, where you can generate an arbitrary amount of "training data". Of course you may not have the resources to do so, but that's a whole different tradeoff.
Even then, it might not really be an issue.

https://youtu.be/5-Kqb80h9rk