| > I cannot be the first person to think about such possibilities Differentiable Rendering [1] is the closest thing to what you are describing.
And yes, people have been working on this for the same reason you outline, it is more data/compute efficient and hence should generalize better. [1] https://blog.qarnot.com/article/an-overview-of-differentiabl... But also:
> While cool, this also seems utterly wasteful. Video games offer known "analytical" solutions for the interactions that the model provides as a "statistical approximation", so to say. A bit of the same debate as people calling LLMs a "blurry JPEG of the web" and hence useless. Yes this is a statistical approximation to an analytical problem... but that's a very reductive framing to what is going on.
To find the symbolic/analytical solution here would require to constrain the problem greatly: not all things on the screen have a differentiable representation, for example complex simulations might involve some kind of custom internal loop/simulation. You waste compute to get a solution that can just be trained on billions of unlabeled (synthetic) examples, and then generalize to previously unseen prompts/environments. |