Hacker News new | ask | show | jobs
by radarsat1 4110 days ago
This is really clever. So basically iiuc, they set up a network to encode down to a representation that consists of parameters for a rendering engine. In order to ensure that this is the representation that is learned, the decoding stage is used to re-render the image subject to transformations and perform the decoding based on a an initial reduction phase after rendering. I.e. it is like an autoencoder, but the inner-most reduced representation is forced to be related to a graphics rendering engine by manipulating related transformation parameters.

Not only is this interesting from the point of view of using it for learning how to generate images, but it is a novel way to force a semantic internal representation instead of leaving it up to a regularisation strategy and interpreting the sparse encoding post-hoc. It forces the internal representation to be inherently "tweakable."

1 comments

This can also be used for object recognition against invariant 3D representations, potentially with more accuracy than traditional convolutional neural net architectures.

Consider: their proof-of-concept face-recognition model achieves performance comparable to traditional convnets on faces with varying degree of pose, lighting, shape and texture, even though it was trained completely unsupervised. I would expect this type of model to beat the state of the art in face recognition and other similar tasks when fined-tuned with supervised training in the not-too-distant future.