This is pretty wild, and I can see pretty cool gaming consequences.
Step 1: Have some stable diffusion model generate an image for your character
Step 2: Use DragGAN or some other model to animate the 2d image
Step 3: Use Nvidia's Neuralangelo to turn the animated 2d image into a proper 3d model
Step 4: Use the model at the bottom of this article to rig an animation skeleton for your model using only text input like "swing a sword while running forward".
Step 1: Have some stable diffusion model generate an image for your character
Step 2: Use DragGAN or some other model to animate the 2d image
Step 3: Use Nvidia's Neuralangelo to turn the animated 2d image into a proper 3d model
Step 4: Use the model at the bottom of this article to rig an animation skeleton for your model using only text input like "swing a sword while running forward".