How to learn reverse engineering... Good question.
First of all, nowadays, free interactive tools like Ghidra drastically accelerate the reverse engineer process. Ghidra's decompiler is able to produce actually reasonable C-like source code in many cases. So, if you want to jump right into the water, do the following:
- Write a small C program and compile it
- Install Ghidra
- Load the binary of your program into Ghidra
- Try to make sense of what you see :)
That's a quite cumbersome way to learn, though, although there are several Youtube videos that follow that approach. Modern decompilers are good, but they are still far from perfect, especially when applied to highly optimized machine code. Often, the output of the decompiler only makes sense when you have the assembly code next to it. Therefore, I highly recommend to first learn programming in assembly, which includes understanding how code and data are represented in main memory, how the stack works, how OS functions are called, etc. The good news: once you have understood how a particular CPU and computer architecture work, you can transfer that knowledge easily to other architectures. Therefore, maybe just start with x86 if your computer has an AMD or Intel CPU: http://download-mirror.savannah.gnu.org/releases/pgubook/Pro...
Once you can read and write assembly language, you can go back to Ghidra and disassemble some programs for which you have the source code. That will teach you a lot about how compilers translate high-level languages to machine code. You will later discover that hand-written machine code sometimes look very different from compiler-generated code.
There is also what I call the platform specific part. That one really depends on the type of programs you want to reverse engineer. For example, if you want to reverse engineer a text adventure written for MS-DOS, the only platform-specific things you will encounter in the code are the four or five BIOS calls necessary for printing text, reading the keyboard, etc. That's something you can easily look up in the Internet without any deeper knowledge of MS-DOS or IBM-compatible PCs. On the other hand, if you are working on an Amiga game, it helps to be familar with the hardware and to be able to recognize the typical series of instructions that are needed to query the state of the mouse buttons etc.
First of all, nowadays, free interactive tools like Ghidra drastically accelerate the reverse engineer process. Ghidra's decompiler is able to produce actually reasonable C-like source code in many cases. So, if you want to jump right into the water, do the following:
- Write a small C program and compile it
- Install Ghidra
- Load the binary of your program into Ghidra
- Try to make sense of what you see :)
That's a quite cumbersome way to learn, though, although there are several Youtube videos that follow that approach. Modern decompilers are good, but they are still far from perfect, especially when applied to highly optimized machine code. Often, the output of the decompiler only makes sense when you have the assembly code next to it. Therefore, I highly recommend to first learn programming in assembly, which includes understanding how code and data are represented in main memory, how the stack works, how OS functions are called, etc. The good news: once you have understood how a particular CPU and computer architecture work, you can transfer that knowledge easily to other architectures. Therefore, maybe just start with x86 if your computer has an AMD or Intel CPU: http://download-mirror.savannah.gnu.org/releases/pgubook/Pro...
Once you can read and write assembly language, you can go back to Ghidra and disassemble some programs for which you have the source code. That will teach you a lot about how compilers translate high-level languages to machine code. You will later discover that hand-written machine code sometimes look very different from compiler-generated code.
There is also what I call the platform specific part. That one really depends on the type of programs you want to reverse engineer. For example, if you want to reverse engineer a text adventure written for MS-DOS, the only platform-specific things you will encounter in the code are the four or five BIOS calls necessary for printing text, reading the keyboard, etc. That's something you can easily look up in the Internet without any deeper knowledge of MS-DOS or IBM-compatible PCs. On the other hand, if you are working on an Amiga game, it helps to be familar with the hardware and to be able to recognize the typical series of instructions that are needed to query the state of the mouse buttons etc.