The animations (specifically Animated Crab and Robot Animation) have quite noticeable AI art artifacts that swirl around the model in unnatural ways as the objects and camera move.
There's some discussion of time in the paper; they compare to Blender Cycles (path tracing) and at least for their <= 4k triangle scenes the neural approach is much faster. I suspect it doesn't scale as well though (they mention their attention runtime is quadratic with number of tris).
I wonder if it would be practical to use the neural approach (with simplified geometry) only for indirect lighting - use a conventional rasterizer and then glue the GI on top.