Hacker News new | ask | show | jobs
by kvark 529 days ago
Because it has to go through gpu anyway before it reaches the screen, gpu can be more efficient at doing this (better battery, etc), and we are wasting time transferring the pixels to gpu where the splines would be much more compact.
1 comments

minor nit: it seems like they're not rasterizing every pixel on cpu, instead just generating heightmap values instead, which is a lot lower resolution?

and games like Dreams have proven that you can ship world class experiences using CPU rasterization. If it's easier and it performs good enough, nothing wrong with it.

We there are two different things here.

The custom CPU rasteriser (Star Machine) that pushes 4k 120hz is mentioned in the intro, but the implementation of spline-based terrain covered by the article is just a prototype developed in blender. Blender is used for faster rapid iteration of the algorithm.

While the Blender version is at least partially GPU accelerated, the final implementation in Star Machine will be entirely on the CPU. It's currently unknown if the CPU implementation will trace against the cached height map or against a sparse point cloud (also cached)

I looked at the Star Machine repository and it looks like its using SDL_gpu [1,2], so I am a little confused about where the 'CPU' rasteriser designation comes from.

[1] https://github.com/Aeva/star-machine/blob/excelsior/star_mac...

[2] https://wiki.libsdl.org/SDL3/CategoryGPU

I haven't read though the whole thing, but my rough understanding is:

- A ray tracer runs on the CPUs, and generates surfels (aka splats)

- The surfels are uploaded to the GPU

- Then the GPU rasterizes the surfels into a framebuffer (and draws the UI, probably other things too)

So it's the ray tracing that's running on the CPU, not the rasterizer. Compared to a traditional CPU ray tracer, the GPU is not idle and still doing what it does best (memory intensive rasterization), and the CPU can do the branchy parts of ray tracing (which GPU ray tracing implementations struggle with).

The number of surfels can be adjusted dynamically per frame, to keep a consistent framerate, and there will be far less surfels than pixels, reducing the PCIe bandwidth requirements. The surfels can also by dynamically allocated within a frame, focusing on the parts with high-frequency detail.

It's an interesting design, and I'm curious what else is possible. Can you do temporal reproduction of surfels? Can you move materials partially onto the GPU?