>>Introduction<<
"Reverse Engineering" is a term that is overblown on the Internet, and social media has painted a clickbaity
picture of the concept to squeeze out any attention left in people of several age groups.
At a high level, reverse engineering is just trying to understand how something works
without knowledge of the particular subject. In the case of software (that is not open source), we try to look behind
the scenes and understand what the developer created. This starts with running the software and observing how it behaves
when you type on your keyboard, click on specific things, or just try to exit it with a key combination.
For the curious, however, this goes as deep as analyzing the assembly instructions that are running on your CPU, which boils down
to literally analyzing bytes of information. The source code may not be available, but the compiled product is still running
on YOUR machine, which you can take apart.
>>But why?<<
Analyzing bytes and looking at different flavors of assembly code may sound like a complete snoozefest to most people, but there
are (a few) practical reasons to try and do it anyway. This starts IT security trying to understand what data the malware encrypted
and sent over the network, and ends somewhere in Ghidra trying to understand why the hell thousands of lines are just MOV instructions
move trash values around. Some may say that you can try to decompile the compiled code, but this does not work reliably. Compiler and
other metadata obfuscate the finished binary, and it may as well be nearly impossible to reconstruct.
A good analogy is a cooked meal; you can't uncook meat or vegetables; the subject simply changes in a one-hundred percent irreversible way.
>>Technicalities<<
What even is a "binary"? This is the deal: We as humans cannot possibly comprehend millions of lines of 1's and 0's, so we created assembly and, after that, high-level languages.
So what actually is "compiling"? A compiler takes your high-level source code, e.g., written in C++, and converts it into machine code, such as an "object file."
A linker then combines this object file with other pre-compiled object files that you may have used in your own source code, like libraries.
When writing code, you will often use functions you did not actually define in your own code. The linker takes your undefined symbols and
adds the correct addresses to these instructions, outputting your final executable file (.exe, .elf). For example, imagine your code calls
an exported DLL function (if you are reading this, I hope you understand what a DLL is and does). When your source code is being compiled,
the function call generates an external function reference because the DLL is NOT a part of your source code.
The linker then adds the information necessary for the reference to actually find the DLL code in memory when the process is started.
This is also why DLLs are always loaded at the same base memory address, which is an important factor for memory hacking.
(Binaries get loaded from your hard disk into their own virtual memory, which is then run by your CPU.)
source code
│
│
│
┌──────────┐
│Compiler │
└──────────┘
│
│
│
┌─────────────────────────────────┐
│.LC0: │
│ .string "hello world" │
│main: │
│ LEA ECX, [ESP+4] │
│ AND ESP, -16 │
│ PUSH DWORD PTR [ECX-4] │
│ PUSH EBP │
│ MOV EBP, ESP │
│ PUSH ECX │
│ SUB ESP, 4 │
│ SUB ESP, 12 │
│ PUSH OFFSET FLAT:.LC0 │
│ CALL PUTS │
│ ADD ESP, 16 │
│ MOV EAX, LC0 │
│ MOV ECX, DWORD PTR [EBP-4] │
│ LEAVE │
│ LEA ESP, [ECX-4] │
│ RET │
│ │
└─────────────────────────────────┘
│
│
│
┌──────────┐
│Linker │
└──────────┘
│
│
│
┌─────────────────────────────────┐
│executable.exe │
│4d 5a 90 00 03 00 00 00-04 00 00 │
│00 ff ff 00 00 b8 00 00 00 00 00 │
│00 00 00-40 00 00 00 00 00 00 00 │
│... │
└─────────────────────────────────┘