| I work in game physics in the AAA industry, and I have studied and experimented with ML on my own. I'm sceptical that that's going to happen. Imagine you want to build a model that renders a scene with the same style and quality of rasterisation. The fastest way to project a point on the screen is to apply a matrix multiplication. If the model needs to keep the same level of spatial consistency as the resterizer, it has to reproject points in space somehow. But a model is made of a huge number of matrix multiplications interspersed by non-linear activations. Because of these non-linearities, it can't map a single matrix multiplication to its underlying multiplications. It has to recover the linearity by approximating the transformation with many more operations. Now, I know that transformers can exploit superposition when processing a lot of data. I also know neural networks could come up with all sorts of heuristics and approximations based on distance or other criteria.
However, I've read multiple papers showing that large models have a large number of useless parameters (the last one showed that their model could be reduced to just 4% of the original parameters, but the process they used requires re-training the model from scratch many times in a deterministic way, so it's not practical for large models). This doesn't mean we might not end up using them anyway for real-time rendering. We could accept the trade-off and give up some coherence for more flexibility.
Or, given enough computational power, a larger model could be coherent enough for the human eye, while its much larger cost will be justified by its flexibility. In a way like analogous systems are much faster than digital ones, but we use digital ones anyway because they can be reprogrammed. With frame prediction and upscaling, we have this trade-off already. |