Hacker News new | ask | show | jobs
by xaedes 1511 days ago
Nice demo! We need more of this approach.

You really can achieve amazing stuff with just plain e.g. OpenGL optimized for your rendering needs. With todays GPU acceleration capabilities we could have town-building games with huge map resolutions and millions of entities. Instead its mostly only used to make fancy graphics.

Actually I am currently trying to build something like that [1]. A big big world with hundreds of millions of sprites is achievable and runs smoothly, video RAM is the limit. Admittedly it is not optimized to display those hundreds of millions of sprites all at once, maybe just a few millions. Would be a bit too chaotic for a game anyway I guess.

[1] https://www.youtube.com/watch?v=6ADWXIr_IUc

4 comments

> We need more of this approach.

1000% agree.

I recently took it upon myself to see just how far I can push modern hardware with some very tight constraints. I've been playing around with a 100% custom 3D rasterizer which purely operates on the CPU. For reasonable scenes (<10k triangles) and resolutions (720~1080p), I have been able to push over 30fps with a single thread. On a 5950x, I was able to support over 10 clients simultaneously without any issues. The GPU in my workstation is just moving the final content to the display device via whatever means necessary. The machine generating the frames doesnt even need a graphics device installed at all...

To be clear, this is exceptionally primitive graphics capability, but there are many styles of interactive experience that do not demand 4k textures, global illumination, etc. I am also not fully extracting the capabilities of my CPU. There are many optimizations (e.g. SIMD) that could be applied to get even more uplift.

One fun thing I discovered is just how low latency a pure CPU rasterizer can be compared to a full CPU-GPU pipeline. I have CPU-only user-interactive experiences that can go from input event to final output frame in under 2 milliseconds. I don't think even games like Overwatch can react to user input that quickly.

Just to be clear - you're writing a "software-based" 3D renderer, right? This is the sort of thing I excelled at back in the late 80s, early 90s, before the first 3D accelerators turned up around 1995 I think.

What features does your renderer support in terms of shading and texturing? Are you writing this all in a high-level language, e.g. C, or assembler? If assembler, what CPUs and features are you targeting?

And of course, why?

> you're writing a "software-based" 3D renderer, right?

Yes. This is 100% what you are familiar with.

> What features does your renderer support in terms of shading and texturing?

I have a software-defined pixel shading approach that allows for some degree of flexibility throughout. Each object in the scene currently defines a function that describes how shade its final pixels based on a few parameters.

> Are you writing this all in a high-level language, e.g. C, or assembler?

I am writing this in C#/.NET6. I do have unsafe turned on for pointer access over low-level bitmap operations, but otherwise its all fully-managed runtime.

> And of course, why?

Because I want to see if I can actually build an effective gaming experience without a GPU in 2022. Secondary objective is simply to learn some new stuff that isnt boring banking CRUD apps.

That's awesome. I think the advantage of a software renderer is that you can adapt your inner loops to do things that a GPU can't do. You can create some new form of polygon-fill that isn't supported by Direct3D or OpenGL etc.

Plus, of course it will run on anything.

I hope you'll be willing to open the code at some point...

Unrelated but wrt. modern rendering versus 90s rendering I'd imagine that a lot of the performance shims used in the 90s might not apply because the critical problem is different.

Performance based development these days isn't so much on maximizing usage of the cycles of the machine (I mean, ok fundamentally it's still about that, but-), rather it's about getting the microcode to do the right thing. E.g. LUTs being extremely bad for caching performance. Branch predictions being a much more important predictor of performance than anything else. Huge rams make a lot of old tips around ram size usage invalid. SIMD / vector based operations and threading are a boon but require a very different way of working

Even if your mental model is as simple as "CPU processing + L1 cache is infinitely fast, having to fetch data from anywhere else is dog slow" you'll be able to optimize code pretty well given the characteristics of modern processors.
If modern high performance code relies on making the microcode do "the right thing", and making sure the right data is in cache then why don't CPU manufacturers allow control over such things?
What’s right for today’s CPU is horrible for next year’s in those terms - and also the other way around.
> One fun thing I discovered is just how low latency a pure CPU rasterizer can be compared to a full CPU-GPU pipeline

i'm definitely going to have to test that! always trying to minimize input delay

I think it can reduce input delay enough to change streaming gaming economics, but the current state of cloud economy makes it difficult to scale in practice.
i'm just starting learning directx and noticed it can render a triangle at 12,000 fps! i had no clue this was possible. i don't think there's any room for input delay there, but i'll find out
Did you consider using an existing software rasterizer, like Mesa llvmpipe? Or part of the challenge was writing one yourself (nothing wrong with that)?
The upper rendering limit generally isn't explored deeply by games because as soon as you add simulation behaviors, it imposes new bottlenecks. And the design space of "large scale" is often restricted by what is necessary to implement it; many of Minecraft's bugs, for example, are edge cases of streaming in the world data in chunks.

Thus games that ship to a schedule are hugely incentivized to favor making smaller play spaces with more authored detail, since that controls all the outcomes and reduces the technical dependencies of how scenes are authored.

There is a more philosophical reason to go in that direction too: Simulation building is essentially the art of building Plato's cave, and spending all your time on making the cave very large and the puppets extremely elaborate is a rather dubious idea.

Is this not done because of technical limitations, or is it just not done because a town building game with millions of entities would not be fun/manageable for the player?

Although, there's a few space 4x games that try this "everything is simulated" kind of approach and succeed. Allowing AI control of everything the player doesn't want to manage themselves is one nice way of dealing with it. See: https://store.steampowered.com/app/261470/Distant_Worlds_Uni...

I immediately thought of the bullet physics games like gradius, parodius, raidan, r-type.

What made it of course was the art. An army of digital illustrators working by hand to create bitmaps that pop.

One pseudo 2.5d game I'm playing now is Iridion 2 GBA (2003). You can see the care taken with the art design team, pure lovers of the genre ;)