Could anyone with more background elaborate on what you think Carmack means with the second tweet? The PS4 already outputs 1080p with (varying algorithms of) antialiasing for several of its launch titles. I'd expect some rendering pipeline shortcuts to cut down on the latency that some operations incur and to achieve a constant 60 FPS, but a generational drop in graphical fidelity seems steep.
Many games on PS4 do NOT run at full 1080p resolution, they are rendered at lower resolution and upscaled. And on top of that, several games on PS4 do not run either at 60 fps. Performance wise, the new consoles are not THAT great, that's why the Occulus Rift team has been specifically said the new consoles will not be good enough for where they want to take VR.
Stereo rendering requires rendering the scene twice, once for each eye. Off the bat, this is almost twice as computationally intensive as rendering it once.
It's rendering it twice, but at only half the resolution.
Thinking about this more, the closest analog to stereo rendering would seem to be games with a split-screen mode. Which, in my experience, often does entail a performance hit (though certainly not a generation-sized one).
My guess is that how much a second viewport affects performance is highly engine-dependent. Ironically, the one game I remember having the biggest delta from single-screen to split-screen is the Dreamcast version of Quake III: I distinctly recall the sharp drop in poly-count on the otherwise curvy rocket launcher.
Rendering the scene twice has overhead, so it would take extra time even if it was the same number of pixels total. But that's far from the only reason VR requires more power.
Due to the distortion caused by the lenses the scene must be rendered at about 1.4x the normal resolution. Then there's a warping step that performs the inverse of the lens distortion, which is an additional cost. Also, good VR requires rendering at 90 Hz, not 60 Hz, so that's another 1.5x. Furthermore, frame tearing artifacts and FPS hiccups are much worse in VR, so you need extra headroom to eliminate them even in worst-case scenarios.
I remember being surprised that the bottleneck for some modern devices (like cell phones) wasn't the number of vertices one could push, but rather the fill rate and number of draw calls. Do consoles these days hit performance bounds as a function of the number of draw calls rather than the number of polygons/vertices?
I ask because I can completely understand how doubling the number of draw calls could be problematic in a VR situation.
Mobile devices are often limited by memory bandwidth, and by OpenGL driver overhead. Consoles have very little driver overhead, so number of draw calls is not as big a problem, but state changes are still costly, so doubling the number of state changes hurts performance. I could imagine some clever techniques to avoid doubling the number of state changes (perhaps a geometry shader that duplicates triangles to render both eyes at once) but I don't know how well that would work. VR rendering is still mostly unexplored!
DK2 isn't good enough. That's why it isn't the consumer version already. Palmer has said in interviews that the consumer version will be higher resolution and higher frame rate. The higher frame rate is required to enable low enough persistence without flickering.