| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by byuu 2344 days ago

What's annoying about that memory space is it's fragmented: $4300-437f is the 128-byte region for DMA registers, but $43xc-f aren't usable (well, $43xf mirrors $43xb for whatever reason. $43xc-$43xe are open bus.)

So basically every 12 bytes, you get 4 bytes that are no longer usable before the next 12 bytes.

I haven't actually looked at this game's code, but it's certainly clever if the author found a way to avoid having to perform unconditional jumps in there that would sacrifice most of the gains in performance.

3 comments

a1369209993 2343 days ago

I'm not sure the ricoh 5A22 has large enough immediates for it, but there is a easy way to avoid a jump over a small amount of dead memory (example in x86):

  430B  3D -- -- -- --  cmp eax dead32  # only affects flags
  4310  xx              dowhatever

See, eg: http://www.muppetlabs.com/~breadbox/software/tiny/revisit.ht... , where it's used to skip over a mandatory header field.

link

FatalLogic 2344 days ago

An unconditional branch should be one of the fastest instructions. Why would that sacrifice most of the gains in performance?

It should cost about the performance gain from two of the effective instructions.

link

RetroSpark 2343 days ago

I used a debugger - it looks like the function in the DMA registers is actually just 32 bits:

  $4317 mvn src,dest
  $431a rts

Each of `src` and `dest` is either $7e or $7f, so this code performs a RAM-to-RAM memcpy.

link

byuu 2341 days ago

Okay, a) that's very clever, but b) mvn is really quite slow. DMA would be faster (presuming the data is on two separate buses, you can't perform RAM -> RAM DMAs.) Barring that, a manually unrolled loop in a slow memory area would definitely beat out an mvn in a fast memory area.

I guess it's easy to judge this 25 years later with all we know now. That was a very cool idea to have implemented back then! Putting the mvn there would definitely be a boost compared to having the mvn be in a slow ROM area (6 master clock cycles per byte transferred.)

link