Hacker News new | ask | show | jobs
by vardump 630 days ago
But with a fraction of CPU resources. Arduino Nano's Cortex M33 is overclocked at 135 MHz, while GBA's ARM7TDMI is running at mere 16.78 MHz.

ARM7TDMI takes 1-4 cycles to perform a simple 32bit x 32bit multiply, depending on the multiplier. I believe Cortex M33 takes just 1 cycle to do same. ARM7TDMI has no divide instruction and critically, no FPU that Quake requires.

GBA has only 32 kB of 0-wait state RAM (AKA internal working RAM). Versus 276 kB on the Arduino Nano.

GBA's 256 kB RAM block (external working RAM) has massive 6 cycle access time when loading a 32-bit value.

It's a true miracle someone managed to even get 1/3 of resolution on this weak hardware!

1 comments

I think the article says the same. The gba port is impressive.

I guess FPU would not be even required with 120 pix horizontal resolution.

CM33 does in a single cycle even more: 2 16 bits multiplications, addition and accumulation, for instance.

Still it is the first time the "full" Quake was ported in less than 300 kB.

Agreed on other counts except for FPU.

Quake performs one FPU divide per pixel for texture mapping perspective correction.

ARM7TDMI does not have any kind of divide, so perspective correction is tricky, even if it's just 120 px horizontally.

Afaik, Quake does not do one divide per pixel, it is in steps of 8 pixels (see dscan.c in winquake). Yes, there is non divide but instead of taking hundreds of cycles, tables and other approximations could be used. Of course, div/vdiv which take only 14 cycles or less are a strong boost on CM4/33.
Oh, it divides only once every 8 pixels and interpolates in between and still looks so good? I stand corrected.

By the way, it's "d_scan.c" for anyone who's trying to web search for it.

It means almost an order of magnitude less divisions (and additional calculations as well).

Quake had to do this because it would have been too much especially for a low-end Pentium when it was released in 1996. Yes it is not even noticeable, especially at low res.

Abrash did this in Quake because those divides are _Free_ when intervened with other code. Pentium FPU is pipelined, you can push FDIV, then FXCH to another data and do something else for a while instead of waiting for the result. The price is hand tuned assembly code that works fast only on Intel FPU in 1996. AMD caught up in 1998-99 finally implementing pipelined FDIV and 0 cycle FXCH.

https://www.phatcode.net/res/224/files/html/ch63/63-02.html

Depends on the textures used. High contrast textures with vertical lines (e.g. dark wood on bright wall) would make the distortion very visible even at 320x200. However most of the game's textures are not like that.

There are some user made maps however where this can be seen (e.g. i remember playing a map which was supposed to be inside a fantasy town and it used a bunch of wood-on-wall textures that made the distortion apparent).