Hacker News new | ask | show | jobs
by wzdd 1232 days ago
It seems less crazy when you look at the code. It seems like someone has gone through each function in a disassembly of the original and written the directly-corresponding C. Pick any module, for example, ending.c's Module18_GanonEmerges(), and you'll see lots of magic constants and direct SNES references like Intro_Clear1kbBlocksOfWRAM(). This is basically a C-ified disassembly of the original combined with half a SNES emulator.
1 comments

Sounds like something ChatGPT could have done in a nearly fully automated fashion now.

EDIT: While downvoted, it is likely true. ChatGPT + Github Copilot are amazing at translating code between languages, especially from minimized code or inferring TypeScript typings from JavaScript.

Extraordinary claims require evidence. If this can be automated, give it a whirl and write a blog about it. I'm sure a LOT of people would love to see this.

Also, keep in mind that translating between source code in two languages is fairly easy, while translating machine code to a compilable language is very hard. It requires knowledge of the machine itself which may or may not exist. Maybe ChatGPT knows enough about SNES to do this? Probably not.

Setting a register to some magic constant and then pushing it to some unknown address will do SOMETHING in the computer but you don't know what or why. It's honestly really messy.

I just quickly grabbed the assembly for Zelda1 here:

https://www.romhacking.net/forum/index.php?topic=32277.0

And then GPTChat gave me this:

  #include <stdint.h>

  #define SONG_TABLE_LEN 32
  #define SONG_HEADER_LEN 8
  #define SONG_HEADER_ITEM_TAKEN_LEN 7
  #define SONG_HEADER_END_LEVEL_LEN 7
  #define SONG_HEADER_OVERWORLD_LEN 8
  #define SONG_HEADER_UNDERWORLD_LEN 8
  #define SONG_HEADER_LAST_LEVEL_LEN 8
  #define SONG_HEADER_GANON_LEN 8
  #define SONG_HEADER_ENDING_LEN 8

  const uint8_t SongTable[SONG_TABLE_LEN] = {
    0x7D, 0xB5, 0x6E, 0x67, 0x7D, 0xAD, 0x64, 0x64,
    0x75, 0x7D, 0x85, 0x95, 0x7D, 0x8D, 0x95, 0x9D,
    0xA5, 0xBD, 0xC5, 0xCD, 0xD5, 0xDD, 0xD5, 0xE5,
    0xED, 0x24, 0x2C, 0x34, 0x3C, 0x44, 0x34, 0x4C,
    0x54, 0x5C, 0x44, 0xF5
  };

  ...

  void DriveAudio(void) {
  unsigned char i;
  unsigned char Square0Note;
  unsigned char Square1Note;
  unsigned char TriangleNote;
  unsigned char NoiseNote;
  unsigned char Envelope;
  unsigned int NoteLength;

  Square0Note = SongScript[SongHeader[SongSelect][4]];
  Square1Note = Square0Note;
  TriangleNote = SongScript[SongHeader[SongSelect][3]];
  NoiseNote = SongScript[SongHeader[SongSelect][5]];
  Envelope = SongHeader[SongSelect][6];

  NoteLength = SongLength[SongScript[SongHeader[SongSelect][0]] + (Envelope & 0x07)];

  // Play square wave on channel 1
  Square1Note = Square0Note;
  if (Square1Note & 0x80) Square1Note = 0;
  Square1Note &= 0x3F;
  Square1Note |= (Envelope & 0x10) | 0x40;
  Square1Note &= 0x7F;
  for (i = 0; i < Square1Note; ++i) {
  // Output audio waveform data
  }

  // Play triangle wave on channel 2
  if (TriangleNote & 0x80) TriangleNote = 0;
  TriangleNote &= 0x7F;
  for (i = 0; i < TriangleNote; ++i) {
  // Output audio waveform data
  }

  // Play noise on channel 4
  if (NoiseNote & 0x80) NoiseNote = 0;
  NoiseNote &= 0x3F;
  NoiseNote |= (Envelope & 0x20) | 0x40;
  for (i = 0; i < NoiseNote; ++i) {
  // Output audio waveform data
  }

  // Wait for note length to complete
  for (i = 0; i < NoteLength; ++i) {
  // Wait
  }
  }
The places where it can not fill in the details is because there are subroutines that are called whose body I didn't include in the prompt. Basically if you take a recursive approach where you feed ChatGPT incrementally the subroutines and whatnot, it can reconstruct it.
I don't think that looks right. It looks like C code but mostly is just empty loops and missing references.

  for (i = 0; i < Square1Note; ++i) {
  // Output audio waveform data
  }
As I said, it is calling subroutines in the code snippet I asked it to translate. I didn't include all of the subroutines. Thus it has stubs for those. To do this properly, you need to basically create a graph of all the subroutines and then ask it to translate the leaf nodes first and slowly work your way up to the root nodes, or something like that. ChatGPT has length limitations for each prompt so you need to have a strategy to overcome those.
That's pretty cool. I wonder if it would work well paired with FernFlower to decompile and de-obfuscate Java byte code. I often find myself stuck with a JAR where the source was never pushed to GIT and have to re-create it from decompiled source.
How did you get it to do that? I asked it to translate Z_00.asm to C and it said: "I'm sorry, I cannot translate a disassembly of code written in assembly language to C."
"Can you covert the following assembly code for the NES into C code?"

[cut and past a bunch of code that fit within the ChatGPT buffer]

And it worked.

SongTable won’t compile (on any modern compiler) because the number of initializers (36) doesn’t match the declaration (32)
Hilarious. Yeah, it definitely inferred the wrong thing there. I checked the original code, there is no length define at all. Thus it just inferred the wrong thing.