Hacker News new | ask | show | jobs
by w-m 817 days ago
Great project, it's nice to see so many people are building stuff with Gaussian Splatting.

Just the other day I went through the whole list of known viewers on MrNeRF's awesome 3DGS resources, to find one that runs on a MacBook. I'm working on compressing 3D scenes by sorting Gaussians into 2D grids [0], and I wanted a native viewer that I could for experiments on the go.. perhaps as an alternative backend to the CUDA one in my colleague's exploratory Python viewer [1].

VulkanSplatting was the only one I could get to compile and run on my Intel MacBook. Unfortunately the feeble Intel GPU isn't able to display even the Lego scene at an interactive framerate. Do you think there's performance headroom, and that it will become possible in the future, or should I give up trying to run this on an Intel MBP?

[0]: https://fraunhoferhhi.github.io/Self-Organizing-Gaussians/

[1]: https://github.com/Florian-Barthel/gaussian_viewer

1 comments

Thanks for trying it out! I haven't had the opportunity to benchmark this on an Intel Macbook. Were you able to see which kernel takes the most time? There should be a performance graph if you have the GUI enabled.

For my Apple Silicon benchmarks, the main bottleneck is the parallel radix sort that sorts the Gaussians by tile and depth. I used a some shaders from a sorting library, but it has some performance gaps with SOTA parallel sort algorithms. I think fixing this would give a 1.5x overall performance boost and maybe 3x on Macbooks. Also the wave size isn't tuned for different GPUs.

Another area of improvement is better management of the shared memory. Right now, we just let the driver manage it as the L1 cache. However, we could manage it manually and group Gaussian retrievals together for the same tile. This is what the official implementation does.

Although 3DGS is the first radiance field with SOTA quality that runs in real-time, I think it's still quite heavy. Due to the explicit representation of the scene, a lot of operations are memory bound. If you can't get an interactive frame rate right now, it's unlikely the improvements will make a material difference.

Hopefully that's where your work on compression comes in and solves the problem :)

For some reason the GUI is not showing up for me in 3DGS.cpp. (I checked out the repo, made a build folder and built with Ninja, then launched ./apps/viewer/vulkan_splatting_viewer).

I still have a VulkanSplatting build from this Wednesday though. In VulkanSplatting, when looking at the Lego scene from above with the default window size, I'm getting just below 1 ms for the sorting kernel and just above 1 ms for "render", everything else is too small in the graph to register. But it only displays at a handful of fps, so it seems quite some time goes unaccounted for.

Maybe spinning up Instruments could give some more insights into what's happening? I tried `cmake -G Xcode` to have that setup easily, but the Xcode CMake generation fails with

    CMake Error in src/shaders/CMakeLists.txt:
      The custom command generating
    
        3DGS.cpp/build-xcode/shaders/shaders.h
    
      is attached to multiple targets:
    
        shaders
        xcode_shaders
    
      but none of these is a common dependency of the other(s).  This is not
      allowed by the Xcode "new build system".
Yeah, I gave it a try and timings seems to be very wrong. I'll fix that soon.

I haven't tried benchmarking SPIR-V shaders on macOS. Since they're translated into Metal shaders anyways, it should be possible theoretically.

Also, for the command line viewer in the new version, I've only tested make or ninja. I'll take a look at xcode when I get a chance.

Update: I just gave Instruments a try and it seems like the Metal compiler grouped all of the compute and copy operations together and just left the timestamp operations to run back to back. Since MoltenVK isn't a conformant implementation, I'm guessing the synchronization dependencies weren't respected.

However, I'm still getting 200ms frame times on the Garden scene at 4K with M1 Pro. The lego scene shouldn't be too bad even on an Intel Mackbook.