Hacker News new | ask | show | jobs
by pistachiopro 1552 days ago
This is indeed rendered in realtime, but one thing to note is it's a "4D" capture, more-or-less meaning each frame of the animation is its own asset. This makes it possible to reproduce subtle physics like the lips sticking together slightly when the actor opens her mouth. The amount of storage space, alone, makes this impractical for anything other than demos. Unity claims they will be able to achieve this level of fidelity using a deep learning-based compression that will allow stuff like this to appear in game cutscenes, but all the movements will still be pre-baked. The only interaction possible will be moving the camera. At that point the technology will be very useful, but it's still a ways away from having such a realistic character that can react to you dynamically.

(Though whether that's just a couple years of software technology progress, or a decade+ for hardware progress, who can say?)

6 comments

>achieve this level of fidelity using a deep learning-based compression

what does that mean? To me, they might have just as well said middle out compression.

I'd guess a 4d version of: https://paperswithcode.com/method/nerf
This might be what you’re looking for: https://gafniguy.github.io/4D-Facial-Avatars/
I guess this means dimensionality reduction for example with the use of a convolutional autoencoder.
Unity recently acquired Ziva, which specializes in the detailed animation of humans and other animals. They were known for their (not realtime) physics-based solutions, but now they have an ML model for faces, apparently. As far as I know, it's still in beta and not widely available. Unity says they will re-release this demo with the Ziva face in a matter of weeks and the quality will be even higher. And possibly allowing interactivity as well?? I guess we'll see in a few weeks.
Superresolution. You have a lower resolution animation (less pixels = less calculations) and then use superresolution to turn that into a 4K image. This is reality right now for NVIDIA GPUs ( I think it’s called DMSS)
They are talking about compressed geometry, not pixels. This is more similar to alembic and other geometry streaming tech https://en.wikipedia.org/wiki/Alembic_(computer_graphics)

There is one out there from 5 years ago or so that is similar to Google's Seurat but for animated stuff, I think pre-baking triangle culling for different views within a limited volume. I can't remember the name of it, from the details I remember (there was a realistic orangutan or something like that rendered with fur) I should be able to find it on Google, but Google search has become degraded recently.

Nvidia DLSS is an important part of how they achieved 30Hz at 4k resolution, but that's more of a shading assist and doesn't affect the animation. The facial animation will be compressed with Ziva's ML solution.
If it's only for cutscenes why not just have a video?
Cutscenes work a lot better (more immersive) if they can correctly reflect runtime-defined assets, e.g. your own character with your customizations, gear and clothes, etc, or the dynamic state of the environment in which gameplay was happening: destruction debris, current time of day, and such.
Plus cutscenes get a lot bigger when you’re doing 4k60 and not 1080p30
Cause they want to push the limits and make their engine look amazing. Also, if they research hard enough, in-game becomes nearly as good as video to the point you can’t tell.

One baby step at a time.

Because actors are expensive. Reshoots are even more of a cost if things change.

With this, you just have the character do exactly what you want, when you want, without needing to talk to an agent.

Video doesn't mean with a camera, just pre-rendered.
Because video cutscenes always look crap 5 years later. Easily distinguishable from in-game rendered.

Of course most games don't care about 5 years later, but it still looks crap.

This is a fair point.
Movement won't be pre-baked, a physics engine sim will be baked in to the neural network, and movements will be another dimension for the deep learning network. And then all of that will be baked into an agent that has been trained to carry out motives (with a simulation of your character, etc). The same applies to speech as movement. And the deep-learned compression rate will be magnificent.
What led you to this bold prediction of the future?
No predictions, just an explainer of how AI agents are trained. For instance, RL is about presenting an environment via rules (gravity, etc), and letting the agent learn its way around, thus discovering what it can and cannot do (a policy for the environment).
You didn't explain how anything actually works, you gave a very crude prediction with a lot of holes of how you think something will work in the future.
[citation required]
This type of tech though is heavily used by the film industry already though - dynamically reacting is not much of a concern there at all.
They also say

> Tension tech for blood flow simulation and wrinkle maps, eliminating the need for a facial rig for fine details

Which sounds like it is not 100% prebaked animation?

The geometry is fully pre-baked, but wrinkle and blood details respond in realtime to the pre-baked geometry.
So, essentially super fancy sprites?

chuckles