Hacker News new | ask | show | jobs
by ragebol 2312 days ago
For the VFX industry, the tracking had already been solved for ages, with those reflective little balls on suits etc. in a mocap system. The Wii sensor bar's thing was that it was really cheap.

But yes, damn close to a holodeck. But you can't see depth in this setup, right?

4 comments

If it's perspective corrected for the camera, it would probably look very distorted for anyone else on set -- whether there's depth or not

And that's certainly not the goal with this. Something along those lines has been around for a while (https://en.wikipedia.org/wiki/Cave_automatic_virtual_environ...). This system seems specifically targeted for solving challenges for film production, as it probably should be.

I am pretty impressed that real time rendering has gotten good enough to use for these purposes. I certainly wouldn't have expected that those backgrounds in the show were coming out of a video game engine.

They mention they cannot push enough GPU juice to the screens, so they only render the camera focus area in full resolution. Also there is 12 frame lag which prevents moving camera too fast.
One solution to this would be to put the camera on a fixture that replays the same motions every time, so they can do a 'dry run' and correct the rendering. (Putting a camera on a fixture is not new, IIRC they did it in Back to the Future - https://www.youtube.com/watch?v=AtPA6nIBs5g is a really good yet succinct documentary on it)
IIRC they did it in Star Wars for the space battles (but it is forty years since I read about it, so may memory may be playing tricks on me).
They did. They only had so many models of the spaceships, so multiple takes with a programmed camera path and compositing was used to increase the number of vessels in the scene.
> they cannot push enough GPU juice to the screens

...yet. That's just a matter of waiting a few more years.

I wonder if most of the next Star Wars movies will be shot with this tech.

I don’t fully get this. They could just employ different computers to render different parts of their cave. It seems more like a cost savings thing than a technical limitation.

And I’m not sure why you’d skimp on a few PC’s if you’re already building a humongous led wall, so maybe there’s something I’m missing.

I worked in a somewhat similar project in 2015, though not as complex as this, to build background videos for DotParty, using UE4 for panoramas and then stitching them. One of the hugest issues we found was that, because of this being a game engine, a lot of things are not deterministic, so if we used multiple cards or computers, particles and other environmental effects would not be in sync, and the stitches were glaring.
Yeah, that’s a good point. And taking out the particles and doing those separately is probably near impossible.

For the non-visible screens it wouldn’t matter that much, but they’d still end up with the moving fulcrum for the main engine.

I believe it's been improved in later versions, as they've focused in these use cases, and there might even be deterministic particles now, but I'm not sure, because I've been out of the VFX market for quite a while.
The part you are missing is the insane complexity involved in keeping perfect frame sync with a low latency across GPUs and machines, especialliy when some final compositing of partial outputs is involved. The stuff sounds simple on paper and it looks like you can just go buy the tech, unpack it and switch it on. The reality is nothing like that. The off the shelf tech is fiddly and barely stable because it is always a low priority feature added with the least possible effort.
If you have a 12 frame delay regardless you have an awful lot of time to get your clocks in sync.

Obviously that tech is not simply unpackable, because they’re on the cutting edge. But that’s also why you could expect some customization.

The overall latency says nothing about the sync precision required. The displays need to be synced and the graphics cards need to have their vsync synchronized between them (usually via dedicated hardware). If the displays are out of sync, you immediately get visible tearing at the seams.

If your parallel renderer divides the image along a grid that does not correspond to display boundaries, you need to gather and composite the partial framebuffers after rendering them. This means that you're now sending frames across the network amd you need to take care that you aren't compositing frames from different timesteps, for example, because the the part of the rendered framebuffer that goes to compositor/display node A arrived in time, but the part going to compositor/display node B somehow didn't.

“They could just...”

Pretty much every time I’ve thought this, it’s turned out I was underestimating the difficulty of doing “just” that.

If it were just that easy, wouldn’t they have done that already?

Who knows? Sometimes people make things a lot more difficult than they have to be.

That isn’t always the case, but asking the question is better than the alternative.

From the video posted upthread, around 3:40 it looks like they're doing just that: https://youtu.be/gUnxzVOs3rk?t=220
The most interesting aspect to me is that the system is pulling double duty by displaying both a dynamic, perspective-correct backdrop for the camera's POV and a static view for environment lighting and reflections outside of the camera's view frustum.

I wonder if they had to take care to mitigate artifacts caused by the dynamic view bouncing off of surfaces facing away from the camera.

What artifacts?
I guess he is thinking about situations where you would use a negative fill
What does that mean?
Look at the ceiling, you can see it moving.
Mocap systems haven't really been able to produce deliverable results without human intervention very long. I'd argue they're still not there, some filtering and cleanup of the data is usually required. A lot of VFX is still about throwing human labour at problems.

Edit: I should note I'm talking about motion capture for characters etc. Capturing the motion of a rigid object like a camera in a controlled environment is very doable.

You're right. The mocap data for the actor who played Thanos in Avengers had to be post processed by artist by hand since the motion of a regular man doesn't correspond to the motion of a larger and heavier Titan. I guess in a few years all that could be automated with ML.
You don't need ML to solve that problem, you can use inverse kinematics:

https://youtu.be/KLjTU0yKS00

They have polarized 3D screens, and with head tracking, you have it.
Was thinking the same, but that'll also work for just 1 PoV
You could multiplex more images if the glasses were synced and actively polarized. Every person gets a timeslice of the total image.
Or VERY high framerates.
It won't trick anyone with stereo vision, but the perspective correction provides all the depth cues needed to create a convincing 3D visual for pirates and non-stereo cameras.
Interestingly I would imagine pulling the non-CG part out of the frame would be possible as they have the ability to generate the exact image without the real-world aspects. Basically a virtual green-screen. Combine that with a stereo camera and the fact that the source CG is actually 3D and you could get very convincing 3D movies I'd think. Yeah, that simplified a lot, but I'd say it's possible. And they still get the benefits of the lighting aspects as well as the immersion for the actors.
pirates? What arrrr you talking about?
Eyepatches.