Hacker News new | ask | show | jobs
by tejask 4109 days ago
One of the authors here. You are absolutely right! In fact, I am currently doing something similar but it is not working as well yet. As far as this work is concerned, we wanted to see how model-free can we go.
1 comments

I don’t understand much of the paper but it looks awesome! I have two questions: Am I understanding it correctly that one would need to convert the internal representation to a textured triangle mesh in order to use ray tracing in the decoder stage? Is the encoder effectively similar to scene reconstruction via structure from motion?
there are many ways to parametrize the decoder. One of the ways is to constrain it to output an explicit mesh or volumetric representation and express the rendering pipeline so that it's differentiable. The encoder will then effectively learn an "inference algorithm" to get the best output. A feedforward neural network is not enough and recurrent computations will eventually be necessary.
Can you explain a bit more why the recurrent network structure becomes necessary at some point? Is that because reversing a CNN naturally means rendering by (de)convolution?
In order to approximately learn a "real" graphics engine with support for basic physics, just feed-forward computation might not be sufficient. A more natural way to learn graphics/physics might be to learn the temporal structure more explicitly. On the other hand, it might also be interesting to just add temporal convolution-deconvolution structure in the existing model. This is work in progress though.