If you want to play with software rendering, here's probably the shortest code that will get an ARGB8888 2D array from main memory to the screen efficiently for all platforms using SDL2 in C https://gist.github.com/CoryBloyd/6725bb78323bb1157ff8d4175d... you'll need to do the translation from a 320x200x8-bit palletized framebuffer to ARGB yourself ;)
At least with SDL3, you don't even need the renderer or the texture anymore. SDL_GetWindowSurface to get the surface and SDL_UpdateWindowSurface to present. That's the more software-graphics you can get from my understanding of the library. SDL still does the double-buffering for you.
SDL has always made it easy to directly present a software buffer of pixels to the screen. I'm not sure why someone would want to use the renderer/texture thing for this use case.
Thank you for sharing this. There's a handful of very popular Quake forks already, but Planimeter publishes a Quake-VS2026 fork that doesn't introduce changes. The team is working on x64 builds, which requires replacing the old SciTech Mult-platform Graphics Library (x86 only) with SDL3 (or port scitech-mgl to x64, which I don't think will happen) and the last I understood, the software renderer may be dropped.
But maybe a software renderer and SDL_Texture could preserve it?
It's certainly the most rudimentary. Small optimisation on the inner-loop would be to pre-calculate the scanline offset before going into the pixel loop:
int s = y*screenRect.w;
for (int x = 0; x < screenRect.w; x++) {
pixels[s + x] = argb(255, frame>>3, y+frame, x+frame);
}
Certainly check the assembly, but loop invariant code motion and strength reduction are basic optimizations. C compilers tend to be good at optimizing indexing patterns even at -O1.
Take a look, GCC and Clang go further than these suggestions by adding screenRect.w to the pointer each iteration to avoid the multiplication: https://godbolt.org/z/YfroqK7T6
Writing anything but pixels[y*screenRect.w + x] in an attempt to be faster, without checking the assembly first, is obfuscation.
(For what it's worth, you can beat the compiler by using *pixels++. I didn't profile the code to check it actually was faster in practice however.)