Hacker News new | ask | show | jobs
by kilpikaarna 382 days ago
> The coolest thing here might be the speed: for a given scene RenderFormer takes 0.0760 seconds while Blender Cycles takes 3.97 seconds (or 12.05 secs at a higher setting), while retaining a 0.9526 Structural Similarity Index Measure (0-1 where 1 is an identical image). See tables 2 and 1 in the paper.

This sounds pretty wild to me. Scanned through it quickly but I couldn't find any details on how they set this up. Do they use the CPU or the Cuda kernel on an A100 for Cycles? Also, if this is doing single frames an appreciable fraction of the 3.97s might go into firing up the renderer. Time-per-frame would drop off if rendering a sequence.

And the complexity scaling per triangle mentioned in a sibling comment. Ouch!

1 comments

This reads like they used the GPU with Cycles:

  "Table 2 compares the timings on the four scenes in Figure 1 of our
  unoptimized RenderFormer (pure PyTorch implementation without
  DNN compilation, but with pre-caching of kernels) and Blender Cy-
  cles with 4,096 samples per pixel (matching RenderFormer’s training
  data) at 512 × 512 resolution on a single NVIDIA A100 GPU."
> Blender Cy- cles with 4,096 samples per pixel (matching RenderFormer’s training

This seems like an unfair comparison. It would be a lot more useful to know how long it would take Blender to also reach a 0.9526 Structural Similarity Index Measure to the training data. My guess is that with the de-noiser turned on, something like 128 samples would be enough, or maybe even less on some images. At that point on an A100 GPU Blender would be close, if not beating the times here for these scenes.

Nobody runs 4096 samples per pixel. In many cases 100-200 (or even less with denoising) are enough. You might run up to low-1000 if you want to resolve caustics.