Hacker News new | ask | show | jobs
by bob1029 1511 days ago
> We need more of this approach.

1000% agree.

I recently took it upon myself to see just how far I can push modern hardware with some very tight constraints. I've been playing around with a 100% custom 3D rasterizer which purely operates on the CPU. For reasonable scenes (<10k triangles) and resolutions (720~1080p), I have been able to push over 30fps with a single thread. On a 5950x, I was able to support over 10 clients simultaneously without any issues. The GPU in my workstation is just moving the final content to the display device via whatever means necessary. The machine generating the frames doesnt even need a graphics device installed at all...

To be clear, this is exceptionally primitive graphics capability, but there are many styles of interactive experience that do not demand 4k textures, global illumination, etc. I am also not fully extracting the capabilities of my CPU. There are many optimizations (e.g. SIMD) that could be applied to get even more uplift.

One fun thing I discovered is just how low latency a pure CPU rasterizer can be compared to a full CPU-GPU pipeline. I have CPU-only user-interactive experiences that can go from input event to final output frame in under 2 milliseconds. I don't think even games like Overwatch can react to user input that quickly.

3 comments

Just to be clear - you're writing a "software-based" 3D renderer, right? This is the sort of thing I excelled at back in the late 80s, early 90s, before the first 3D accelerators turned up around 1995 I think.

What features does your renderer support in terms of shading and texturing? Are you writing this all in a high-level language, e.g. C, or assembler? If assembler, what CPUs and features are you targeting?

And of course, why?

> you're writing a "software-based" 3D renderer, right?

Yes. This is 100% what you are familiar with.

> What features does your renderer support in terms of shading and texturing?

I have a software-defined pixel shading approach that allows for some degree of flexibility throughout. Each object in the scene currently defines a function that describes how shade its final pixels based on a few parameters.

> Are you writing this all in a high-level language, e.g. C, or assembler?

I am writing this in C#/.NET6. I do have unsafe turned on for pointer access over low-level bitmap operations, but otherwise its all fully-managed runtime.

> And of course, why?

Because I want to see if I can actually build an effective gaming experience without a GPU in 2022. Secondary objective is simply to learn some new stuff that isnt boring banking CRUD apps.

That's awesome. I think the advantage of a software renderer is that you can adapt your inner loops to do things that a GPU can't do. You can create some new form of polygon-fill that isn't supported by Direct3D or OpenGL etc.

Plus, of course it will run on anything.

I hope you'll be willing to open the code at some point...

Unrelated but wrt. modern rendering versus 90s rendering I'd imagine that a lot of the performance shims used in the 90s might not apply because the critical problem is different.

Performance based development these days isn't so much on maximizing usage of the cycles of the machine (I mean, ok fundamentally it's still about that, but-), rather it's about getting the microcode to do the right thing. E.g. LUTs being extremely bad for caching performance. Branch predictions being a much more important predictor of performance than anything else. Huge rams make a lot of old tips around ram size usage invalid. SIMD / vector based operations and threading are a boon but require a very different way of working

Even if your mental model is as simple as "CPU processing + L1 cache is infinitely fast, having to fetch data from anywhere else is dog slow" you'll be able to optimize code pretty well given the characteristics of modern processors.
If modern high performance code relies on making the microcode do "the right thing", and making sure the right data is in cache then why don't CPU manufacturers allow control over such things?
What’s right for today’s CPU is horrible for next year’s in those terms - and also the other way around.
> One fun thing I discovered is just how low latency a pure CPU rasterizer can be compared to a full CPU-GPU pipeline

i'm definitely going to have to test that! always trying to minimize input delay

I think it can reduce input delay enough to change streaming gaming economics, but the current state of cloud economy makes it difficult to scale in practice.
i'm just starting learning directx and noticed it can render a triangle at 12,000 fps! i had no clue this was possible. i don't think there's any room for input delay there, but i'll find out
Did you consider using an existing software rasterizer, like Mesa llvmpipe? Or part of the challenge was writing one yourself (nothing wrong with that)?