Security_RNRF
0x16. 본문
16. Reverse engineering C programs - bin.0x10
: Content
-> function calls
-> variables
-> if, for-/while-loop
: Create a simple C program and analyze the assembly code.
-> vim test.c
-> int main(){
printf(”Test\n”);
}
-> gcc test.c -o test
-> r2 test
> aaa (Error?)
> s main
> pdf
;-- main:
/ (fcn) sym.main 23
| sym.main ();
| ; DATA XREF from 0x0000054d (entry0)
| 0x0000063a 55 push rbp
| 0x0000063b 4889e5 mov rbp, rsp
| 0x0000063e 488d3d9f0000. lea rdi, qword str.Test ; 0x6e4 ; "Test"
| 0x00000645 e8c6feffff call sym.imp.puts ; int puts(const char *s)
| 0x0000064a b800000000 mov eax, 0
| 0x0000064f 5d pop rbp
\ 0x00000650 c3 ret
: Download from "github" let's analyze the simple code.
-> git clone https://github.com/LiveOverflow/liveoverflow_youtube.git
-> One is about variables and data types.
-> One is about function calls.
-> The other is controlling loop-like flows.
-> Start the “variables.c”
: vim variables.c
-> #include <stdint.h>
#define XXX __asm__("nop"); # "XXX" is defined by the assembly "NOP" command.
# This is because you can find "NOP" later on when you look at the breakdown.
# Therefore, it is easy to see which line C code is responsible for which line.
// a small struct
struct r {
uint64_t r1; # Tips. Better use of data types can result in a specific size.
uint32_t r2;
};
int main() {
// different datatypes in C
XXX;
volatile int a = 0x1234;
XXX;
volatile unsigned int b = 0x1234;
XXX;
volatile uint32_t c = 0x1234;
XXX;
volatile uint64_t d = 0x1234;
XXX;
volatile int e = -0x1234;
XXX;
volatile unsigned int f = -0x1234;
XXX;
volatile float g = 0;
XXX;
volatile float h = 12.34;
XXX;
volatile float i = -12.34;
XXX;
volatile double j = 0;
XXX;
volatile double k = 12.34;
XXX;
volatile double l = -12.34;
XXX;
volatile uint32_t m[10] = {0x0, 0x1, 0x22, 0x333, 0x4444};
XXX;
volatile uint32_t m2 = m[2];
XXX;
volatile char n = 'A';
XXX;
volatile uint8_t o = 'B'; // a character moved into an integer?
XXX;
volatile const char *p = "AAAA";
XXX;
volatile char *q = "BBBB";
XXX;
XXX;
XXX;
// the struct
volatile struct r s = {0};
XXX;
s.r1 = 0x41414141414141;
XXX;
s.r2 = 0x414141;
XXX;
XXX;
XXX;
// f is 64bit. So what happens on 32bit?
f += 0x4141414141;
XXX;
int t = a++;
XXX;
int u = ++a;
XXX;
XXX;
XXX;
return 0;
}
: Tips. Refer to this page (http://matt.sh/howto-c) to learn more about programming C correctly.
-> Since you have added "Makefile," you can simply type "make" into the terminal to compile all.
-> "Makefile" is a small script that defines how a project is compiled.
-> make
gcc -std=c99 functions.c -m32 -o functions32
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/stdint.h:9:0,
from functions.c:1:
/usr/include/stdint.h:26:10: fatal error: bits/libc-header-start.h: 그런 파일이나 디렉터리가 없습니다
#include <bits/libc-header-start.h>
^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:6: recipe for target 'functions32' failed
make: *** [functions32] Error 1
-> Attempting to compile a 32-bit version into "-m32" on a 64-bit machine results in an error.
-> You must first install a 32-bit library.
-> sudo apt-get install libc6-dev-i386
-> make
gcc -std=c99 functions.c -m32 -o functions32
gcc -std=c99 functions.c -o functions64
gcc -std=c99 variables.c -m32 -o variables32
gcc -std=c99 variables.c -o variables64
gcc -std=c99 control_flow.c -o control_flow
gcc -std=c99 example.c -o example
: Let's break down the 32-bit and 64-bit versions of the code by opening them together in "gdb".
-> vim variables.c
-> gdb variables32
-> gdb variables64
-> set disassembly-flavor intel
-> disassemble main
-> First, all local variables are stored somewhere in the stack.
-> correction!
-> 64-bit: Variables from Base Pointer(= rbp)
-> 32-bit: Varialbes from Stack Pointer(= esp)
-> Then you can see that the assembly code is not aware of the negative number.
-> They are something that is "fff…".
-> There is one difference between 32-bit and 64-bit code.
-> To create a 64-bit, you must create two.
volatile uint64_t d = 0x1234;
32bit: 0x00000000000006a6 <+60>: mov QWORD PTR [rbp-0xa0],0x1234
64bit: 0x0000059b <+78>: mov DWORD PTR [ebp-0x98],0x1234
0x000005a5 <+88>: mov DWORD PTR [ebp-0x94],0x0
-> Floating decimal places are also interesting. This is because it is stored elsewhere in the program. The value is then moved to the local variable.
-> The arrangement is also interesting. You have created an array of 10 values, but only the first five values are set to their default values.
-> These values are stored in the stack. Then move from that position in the stack to the array position.
-> Instead of writing directly to an array, it is done in this way.
-> You can tell that the letter is just a byte.
-> 64-bit: 0x0000067d <+304>: mov BYTE PTR [ebp-0xce],0x41
-> 32-bit: 0x000000000000079d <+307>: mov BYTE PTR [rbp-0xce],0x41
-> It doesn't matter if there is an 8-bit or "char" or an unsigned "int".
-> String is also referenced by address.
-> 64-bit: 0x00000693 <+326>: mov DWORD PTR [ebp-0xa8],eax
-> 32-bit: 0x00000000000007b4 <+330>: mov QWORD PTR [rbp-0x80],rax
-> Thus, the local variable is not a character array.
-> Regional variables only include addresses that point to the string.
: Next see “control_flows”
-> radare2 control_flows
> aaa # “aa” means “analyze all” and “aaa” means executes additional commands with “aa”.
> s sym.main # “s” is used to specify the current navigation location. so "s sym.main" sees the main functions of the current position.
> VV # "VV" (VV, capital letters) is a command that helps you view in graph mode.
-> First, set the variable to “0”.
mov dword [local_4h], 0
-> Then comes "if."
mov eax, dword [local_4h]
cmp eax, 0xff
jle 0x614;[ga]
-> Load this local variable into the register and compare it to the hexadecimal "ff".
-> And if it is small or equal, it jumps.
-> Then comes "while."
0x642 ;[gf]
; JMP XREF from 0x00000636 (main)
mov eax, dword [local_4h]
cmp eax, 9
jle 0x638;[gh]
-> Reload the local variable into the register, compare the values, and jump inward.
0x638 ;[gh]
; JMP XREF from 0x00000648 (main)
nop
mov eax, dword [local_4h]
add eax, 1
mov dword [local_4h], eax
-> Then reload, increase and rewrite this value within the block.
-> Now compare it to the "for" loop.
-> Start by setting the variable to “0”.
-> Then compare whether the loop conditions are still correct.
-> And within the loop block, you can see "NOP."
-> Then increase the variable by one at the end of the block.
-> In conclusion, it is the same as the "while" loop.
: Next see “functions”
-> The moves the zero in "eax" is a 64-bit version.
-> Otherwise, the function calls appear to be the same. except for address…
-> If "ASLR" is not present, 64-bit code is typically hexadecimal 40. The 32-bit code is a hexadecimal 80.
-> clarification!
64-bit: 40058e
32-bit: 804846c
-> The reason why this is helpful is that you can immediately see that the address "40" is pointing to the code.
-> The following function returns the value and stores it in a variable.
-> In both cases 32-bit and 64-bit, the value is taken from the "eax" register.
-> If so, clearly the value is processed and returned through "eax".
-> The next "fun3" passes the parameters.
-> At 32-bit, you can see that the values are loaded somewhere and then stored at the top. And the function is called.
-> However, at the 64-bit you can see that the value is loaded into the "edi" register.
-> This is first a big difference.
-> A function of 64 bits appears to be called with a parameter in the register and a parameter of 32 bits is stored in the stack.
-> The following functions use two parameters: Then again you can see how 32 bits place the value on the stack.
-> The first parameter at the top of the stack, the second one goes down a little.
-> However, 64-bit uses "esi" and "edi."
-> I have a question. What should 64 bits do if there are too many parameters?
-> For 32 bits, you can see how the parameters are stored in the stack. And the first parameter is at the top of the stack and the last value moves.
-> At the 64-bit, you can see that the first parameter is stored in the register "edi", “esi”, “edx”, “ecx”…
-> However, starting with the seventh parameter, it is also stored in the stack.
: Through this video clip, it is now possible to identify various assembly patterns of all kinds.
-> You don't always need a decompiler.
-> Decompile more and more programs make this pattern more easily recognizable.
-> Using the same method, the syntax "Hopper", "radare", "gdb" this play code.
-> Example. at&t assembler is different from Intel syntax.
: Next time, let's practice turning back the program.