|
|
|
|
|
by teddykoker
1422 days ago
|
|
Link to paper [1]. It looks like the authors construct a basic autoencoder to predict frames in videos of various videos (double pendulum, lava lamp, etc.) and then use the Levina–Bickel algorithm [2] to determine the expected "intrinsic dimension" of the latent space of the autoencoder. They then refer to the size of the intrinsic dimension as the "minimum number of variables required by the system to accurately capture the motion.", e.g. 24 for a video of a fireplace. Personally, I wonder how much information this actually provides about a system. Since the neural network is non-linear, a single latent variable may theoretically function as more than one state variable. [1]: https://www.nature.com/articles/s43588-022-00281-6.epdf?shar... [2]: https://www.stat.berkeley.edu/~bickel/mldim.pdf |
|
It could be that there are a few valid states that can be highly encoded in each of these situations, and the autoencoders found them.
This could be highly useful.