| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by corysama 8 days ago

If you want to play with software rendering, here's probably the shortest code that will get an ARGB8888 2D array from main memory to the screen efficiently for all platforms using SDL2 in C https://gist.github.com/CoryBloyd/6725bb78323bb1157ff8d4175d... you'll need to do the translation from a 320x200x8-bit palletized framebuffer to ARGB yourself ;)

If you want to get inspired by what can be done with palletized framebuffers check out http://www.effectgames.com/demos/canvascycle/ (click Show Options) and the GDC presentation by the artist https://youtu.be/aMcJ1Jvtef0

With that you can fire up https://github.com/mriale/PyDPainter for that classic Deluxe Paint IIe vibe. Or, https://www.aseprite.org/ for something more modern.

3 comments

yunnpp 8 days ago

At least with SDL3, you don't even need the renderer or the texture anymore. SDL_GetWindowSurface to get the surface and SDL_UpdateWindowSurface to present. That's the more software-graphics you can get from my understanding of the library. SDL still does the double-buffering for you.

link

TazeTSchnitzel 7 days ago

SDL has always made it easy to directly present a software buffer of pixels to the screen. I'm not sure why someone would want to use the renderer/texture thing for this use case.

link

bellowsgulch 8 days ago

Thank you for sharing this. There's a handful of very popular Quake forks already, but Planimeter publishes a Quake-VS2026 fork that doesn't introduce changes. The team is working on x64 builds, which requires replacing the old SciTech Mult-platform Graphics Library (x86 only) with SDL3 (or port scitech-mgl to x64, which I don't think will happen) and the last I understood, the software renderer may be dropped.

But maybe a software renderer and SDL_Texture could preserve it?

link

pan69 8 days ago

It's certainly the most rudimentary. Small optimisation on the inner-loop would be to pre-calculate the scanline offset before going into the pixel loop:

    int s = y*screenRect.w;
    
    for (int x = 0; x < screenRect.w; x++) {
       pixels[s + x] = argb(255, frame>>3, y+frame, x+frame);
    }

link

kmill 8 days ago

I'd be surprised if the compiler didn't make that optimisation on its own.

link

canyp 8 days ago

Possibly, but always check the assembly.

The even faster version, opts aside, would be to initialize the pointer at y*screenRect.w and ++ at every loop to avoid the addressing arithmetic.

link

kmill 8 days ago

Certainly check the assembly, but loop invariant code motion and strength reduction are basic optimizations. C compilers tend to be good at optimizing indexing patterns even at -O1.

Take a look, GCC and Clang go further than these suggestions by adding screenRect.w to the pointer each iteration to avoid the multiplication: https://godbolt.org/z/YfroqK7T6

Writing anything but pixels[y*screenRect.w + x] in an attempt to be faster, without checking the assembly first, is obfuscation.

(For what it's worth, you can beat the compiler by using *pixels++. I didn't profile the code to check it actually was faster in practice however.)

link