| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 0x02A 864 days ago

Thanks for trying it out! I haven't had the opportunity to benchmark this on an Intel Macbook. Were you able to see which kernel takes the most time? There should be a performance graph if you have the GUI enabled.

For my Apple Silicon benchmarks, the main bottleneck is the parallel radix sort that sorts the Gaussians by tile and depth. I used a some shaders from a sorting library, but it has some performance gaps with SOTA parallel sort algorithms. I think fixing this would give a 1.5x overall performance boost and maybe 3x on Macbooks. Also the wave size isn't tuned for different GPUs.

Another area of improvement is better management of the shared memory. Right now, we just let the driver manage it as the L1 cache. However, we could manage it manually and group Gaussian retrievals together for the same tile. This is what the official implementation does.

Although 3DGS is the first radiance field with SOTA quality that runs in real-time, I think it's still quite heavy. Due to the explicit representation of the scene, a lot of operations are memory bound. If you can't get an interactive frame rate right now, it's unlikely the improvements will make a material difference.

Hopefully that's where your work on compression comes in and solves the problem :)

1 comments

w-m 864 days ago

For some reason the GUI is not showing up for me in 3DGS.cpp. (I checked out the repo, made a build folder and built with Ninja, then launched ./apps/viewer/vulkan_splatting_viewer).

I still have a VulkanSplatting build from this Wednesday though. In VulkanSplatting, when looking at the Lego scene from above with the default window size, I'm getting just below 1 ms for the sorting kernel and just above 1 ms for "render", everything else is too small in the graph to register. But it only displays at a handful of fps, so it seems quite some time goes unaccounted for.

Maybe spinning up Instruments could give some more insights into what's happening? I tried `cmake -G Xcode` to have that setup easily, but the Xcode CMake generation fails with

    CMake Error in src/shaders/CMakeLists.txt:
      The custom command generating
    
        3DGS.cpp/build-xcode/shaders/shaders.h
    
      is attached to multiple targets:
    
        shaders
        xcode_shaders
    
      but none of these is a common dependency of the other(s).  This is not
      allowed by the Xcode "new build system".

0x02A 864 days ago

Yeah, I gave it a try and timings seems to be very wrong. I'll fix that soon.

I haven't tried benchmarking SPIR-V shaders on macOS. Since they're translated into Metal shaders anyways, it should be possible theoretically.

Also, for the command line viewer in the new version, I've only tested make or ninja. I'll take a look at xcode when I get a chance.

Update: I just gave Instruments a try and it seems like the Metal compiler grouped all of the compute and copy operations together and just left the timestamp operations to run back to back. Since MoltenVK isn't a conformant implementation, I'm guessing the synchronization dependencies weren't respected.

However, I'm still getting 200ms frame times on the Garden scene at 4K with M1 Pro. The lego scene shouldn't be too bad even on an Intel Mackbook.