In this current generation, "world models" is basically a marketing term. You can research gaussian splatting, novel view synthesis, neural radiance fields (nerf), etc... I find Mr Nerf is good to follow: https://x.com/janusch_patas
There is another thing called world models that involves predicting the state of something after some action. But this is a very very limited area of research. My understanding of this is that there just isn't much data of action->reaction.
Same issue with gaussian splatting/nerf really, very little data (relative to text/images/videos) of text -> 3d splats. I'd guess what world labs are doing is text -> image -> splats, but of course it is just speculation.
> There is another thing called world models that involves predicting the state of something after some action. But this is a very very limited area of research. My understanding of this is that there just isn't much data of action->reaction.
Folks interested in this can look up Yann LeCun's work on world models and JEPA, which his team at Meta created. This lecture is a nice summary of his thinking on this space and also why he isn't a fan of autoregressive LLMs: https://www.youtube.com/watch?v=yUmDRxV0krg
There is another thing called world models that involves predicting the state of something after some action. But this is a very very limited area of research. My understanding of this is that there just isn't much data of action->reaction.
Same issue with gaussian splatting/nerf really, very little data (relative to text/images/videos) of text -> 3d splats. I'd guess what world labs are doing is text -> image -> splats, but of course it is just speculation.