Reverse Engineering (Tutorial)


This tutorial deals with reverse engineering (RE).
It is organised as follows: In the introduction, the scope of the tutorial
is defined. After presenting the prerequisites, a systematic way to RE
is described - based on an exemplary executable[1] (credits to lepricaun),
that has been used among others in a thread about "cracking" here on AO[2].
The goal in this tutorial is to "crack" that executable, and hence is more
application-oriented than theoretical.


Introduction


RE is the process of taking something (a device, an electrical component,
a software program, etc.) apart and analyzing its workings in detail,
usually with the intention to construct a new device or program that
does the same thing without actually copying anything from the original[3].
Here, RE is introduced to craft code (in c) that is doing the same as the
original program.

RE can be used to perform cracking, which is defined as the "removal of
copy protection". This is criminal under certain circumstances, at least
in the US, due to the Digital Millennium Copyright Act (DMCA)[4,5]).

The above mentioned executable runs on MS Windows - hence, in this tutorial
we restrict ourselves to use MS Windows based applications/tools.

Final notes: Neither am I a professional programmer nor have I studied
IT. If I use terminology in the wrong context, please stay calm and correct
me. And have in mind: Neither am I native English speaking, nor do I life
in a country with English as its official language.


Prerequisites


The tools , we are going to use are
. Microsoft Visual C++ 6.0

With this program, we will create our new code. In addition, this environment
has a nice disassembler, which helps us to check our effort easily. If
not available I will mention an alternative way.

. PE Explorer [6]

PE Explorer will be our main tool to do the disassembly.

. OllyDbg 1.10 [7]

OllyDbg 1.10 is a very useful runtime debugger and disassembler.

. The windows calculator [8]

For calculations in the hexadecimal system and/or conversions to the
decimal system.


I cannot recommend books about RE (haven't read any), but these seem
to be quite popular:
. Hacker Disassembling Uncovered[9]
. Assembly Language Step-by-step: Programming with DOS and Linux[10]
. Online Resources[11]


Having the above tools available, we are ready.


Determination of the program structure


(file: crackit_7.asm)

The first step: We disassemble the executable using PE Explorer and look
at it. Usually, not all program parts are transparent. A RE process involves
a lot iterations of the whole systematic way presented in this tutorial.
After one iteration, you understand the code a bit more, and in a second
iteration you can correct errors and/or add new information to your notes.
Be patient.



- trick: Strings

In the disassembler view, there is a tab "Strings". The crackme's 1,2
and 3 can be solved immediately.

. extract the user-crafted code

(file: crackit_7_relevant.asm)

PE Explorer delimits connected code segments with a delimiter ";------...".
Check for
Code:
        push    ebp
        mov ebp,esp
        pop ebp     ; eventually, but most likely
since these commands initiate a function (like main()). There are quite
a few, but check the following code for the coder's specific information,
here like
Code:
        jmp L00401B70
        Align   8
 SSZ00401290_R0m3oD3l7a:
        db  'R0m3oD3l7a',0
Sometimes, this task is not so simple, since some compilers add a lot
of stuff to it (like the Microsoft one). Two hints: 1) Make a runtime debugging
using Ollydbg. Follow the code with F7 until you will see something that
is specific for the coder. 2) Here, at some point, you have to enter a
login ie there must be a function call to "gets", for example. Check the code
for the first appereance of "gets" and travel back carefully.

Follow the code until you stumble across the first
Code:
        mov ebx,[ebp-04h] ; eventually, but most likely
        leave
        retn
This ends the program.


. scan for memory regions that are used (variables, constants)

(file: crackit_7_ebp.asm)

It is very helpful to know where in the memory the variables, constants
are stored. The address of the relevant memory is in the basepointer ebp.
In asm, you access it with [ebp]. Data is usually stored in addresses
smaller than ebp, ie [ebp-04h] is typical for the first variable (in the
very simple program below "i"). Test it: With Visual C++ in debug mode,
compile the program
Code:
void main(){
int i=1;
}
jump into the code and press: CTRL-F11. You will see something like
Code:
0040D6D8   mov         dword ptr [ebp-4],1
which is i=1, as expected.
In case you do not have Visual C++, compile with your favorite compiler
and use PE Explorer or Ollydbg to look for the addresses.

In our project, we have to deal with quite a lot of variables. We expect
at least two sets: two char-arrays, one for the login, one for the password.
We know, that we have to type in the login/password, so let's check for
eg. "jmp_msvcrt.dll!gets".
Code:
        call    jmp_msvcrt.dll!gets
        lea eax,[ebp-00000088h]
Indeed: The login will be stored from [ebp-88h] up to [ebp-88h+length_of_loginstring-1].
We further find the position of the password:
Code:
        call    jmp_msvcrt.dll!gets
        lea eax,[ebp-000000F8h]
Check for all [ebp] in the relevant code segment, however keep in mind
that some are not important. Summary:


[ebp-0Ch] 0 (so far)
[ebp-10h] length_username (result of strlen is in EAX, check crackit_7_ebp.asm)
[ebp-14h] length_passsword (result of strlen is in EAX, check crackit_7_ebp.asm)

[ebp-88h] start of the login. The last one in that series around -88h is -7Ch.
... Let's introduce the notation: login=ABCDEFGHIJKLM, where A
[ebp-7Ch] is a label of [ebp-88h], and M is a label of [ebp-7Ch]

[ebp-F8h] F8h up to DEh is used - A 27-char password? (Is this guy crazy? ;D).
... It looks like. Let's introduce the notation: password=abcdefgh....xyz1,
[ebp-DEh] where a is a label of [ebp-F8h], and 1 is a label of [ebp-DEh]

[ebp-178h] There is another huge region. So far, it's content is unknown
...step 4... but you might have seen, some of it is used for login/password
[ebp-110h] testing purposes. eg.
Code:
cmp dword ptr [ebp-00000168h],00000027h
[ebp-00FCh] A pointer to a string "R0m3oD3l7a"

Please be aware, the relevance of certain regions in the memory are not
a priori clear. Just make a reasonable list, which you alter during the
whole RE.

. analyse the program structure

L00401280: Starting point of "main()". Some "strings" are defined.

SUB_L0040136D: Here, something happens. After reading in the login-string
(and getting the length) a bunch of comparisons take place, e.g.
Code:
        movzx   eax,[ebp-00000088h] ; A
        cmp al,[ebp-00000087h]  ; B
        jz  L0040155D           ; if (A==B) jump to L0040155D
All in the same manner. What is in L0040155D?
L0040155D: It reads in the password, but then jumps immediately to L00401709.
L0040157C: This also reads in the password, and then determines the length.
L004015AE: L004015C3: Seems kind of loop. Some copying or so?
L004015E7: A lot of comparisons, which seems to involve the login and the password.
L00401709: Output: a taunt! And then it jumps to L0040195F: The END!
Aha, this explains L0040155D.
L00401740: Jesus, another bunch of comparisons. Also with jumps to L00401709.
L0040180E: ...whatever...
L0040181C: And a final set of tests. Again with jumps to L00401709.
But if we pass all these, we are done:
Code:
        mov dword ptr [esp],SSZ00401330__Congratz__you_have_done_it____
        call    jmp_msvcrt.dll!printf
L0040195F: This is the end, my only friend, the end ...

- trick: cracking

(file: crackit_7_cracked.exe)

One sort of cracking would be to replace the above mentioned "jz L0040155D"
at position 004013E3 with "jz to_the_Congratz__you_have_done_it____", which is
at position 0040191D.

jz L0040155D ; = 0F84 7401 0000 ; jumps to 0040155D if A=B.

We want to jump to 0040191D. 0040191D - 004013E3 = 053A.
jump command itself takes 6 byes: 053A - 6 = 0534.

jnz 34050000 ; = 0F85 3405 0000 ; jumps to 0040191D if A != B (more possibilities)

For the hex-edit, use for example [12].
Of course, you can jump before the first comparison, such that not even
the prompt for a login appears.


Construction of an own framework


(file: main.c)

The second step: Using the accumulated understanding of the program,
we try to construct our own framework using Microsoft Visual C++.
What does that mean? We create a main.c-file that contains the same
structure, ie variables, constants, function-availability (like strlen,
printf, scanf), with all variables, constants at the same position in
memory (relative to the ebp). Let's try the following:
Code:
#include <stdio.h>
#include <string.h>

void main(){
    int i=0;int j=0;int k=0;
    int length_username;
    int length_password;

    char dummystring0[99];
    char username[13];
    char dummystring1[83];
    char password[27];
    char dummystring2[20];
    int what_is_this[27];

    sprintf(username,"ABCDEFGHIJKLM");
    sprintf(password,"abcdefghijklmnopqrstuvwxyz1");

    length_username=strlen(username);
    length_password=strlen(password);

    what_is_this[0]=0;
    what_is_this[1]=1;
    ...
With Visual C++ in debug mode, jump in the code and check with CTRL-F11
which memory-positions are used for length_username (should be [ebp-10h]),
..., what_is_this[0] (should be [ebp-178h]). Adjust the size of the
dummystrings correspondingly.

With this setup, we can copy 1-1 the assembler code from our disassembled
file into our main.c-file. This simplifies the following analysis.


Analysis of code segments


(file: crackit_7_analysis.asm, main.c)

The third step: Now, we start to work. Each code segment has to be looked
at in detail. Some are simple (in crackit_7_analysis.asm: Equation 1)
and can be RE directly, namely "Equation 1: A!=B" (remember our notation).
others are more tricky (in crackit_7_analysis.asm: Equation 2). While
the experienced programmer recognizes "Equation 2: F=M+M%2", a beginner
does not know it for sure. How to get it?


- trick: add asm in c (continuation of the above main.c)
Code:
// 2. Equation: F = M + M%2
// The following part is calculating "i+i%2"
    for (i=0;i<127;i++){
        __asm
        {
            mov dl, [ebp-04h]   ; copy the value of "i" into dl

; copy the asm-code to look at in here:

        mov al,dl
        sar al,07h
        shr al,07h
        add al,dl
        sar al,1
        add al,al
        sub dl,al
        mov al,dl
        movsx   edx,al
    ;   movsx   eax,[ebp-7Ch]   ; this has to be substituted
        movsx   eax,[ebp-04h]   ; this is the only substitution
        lea eax,[eax+edx]       ; the result is in eax


            mov  [ebp-08h],al   ; copy the result in al into "j"

        } // asm
        printf("%d:%d\t",i,j);  // print: initial value M -> function(M)
    }  // for i
Look at the output carefully. You can deduce i+i%2, eg.
i=6: i%2=0: j=i+i%2 = i = 6
i=7: i%2=1: j=i+i%2 = i +1 = 8
i=8: i%2=1: j=i+i%2 = i = 8

There is another example in main.c:
Code:
// 16. Equation: I%5 = 2
// The following part is calculating "i%5"
Since the crazy programmer has chosen a login of length 13 and a password
of length 27 (!), a lot of work is ahead of us. However, in this section
I will present all extraordinary constructions.

. the ominous [ebp-000000FCh]

Code:
; 8. Equation: L - 5Bh = strlen("R0m3oD3l7a") = Ah
        movsx   eax,[ebp-7Dh]
        lea ebx,[eax-5Bh]
        mov eax,[ebp-000000FCh] ; prepare for strlen
        mov [esp],eax
        call    jmp_msvcrt.dll!strlen
        cmp ebx,eax         ; result of strlen is in eax!
        jnz L0040155D
We put L (remember our notation for the login) into eax, and subtract
5Bh: ebx = L-5Bh. ebx is then compared with the length of that string
which address is stored at [ebp-000000FCh]: The length of "R0m3oD3l7a"
is 10 = Ah. Hence, the 8. Equation is L - 5Bh = Ah

. "[ebp-178h]" and the loop on L004015AE: L004015C3:
The meaning of the "[ebp-178h]"-array.

Code:
        mov edx,[ebp-0Ch]
        lea eax,[ebp-08h]
        add eax,[ebp-0Ch]
        sub eax,000000F0h
        movzx   eax,[eax]
        sar al,1
        movsx   eax,al
        mov [ebp+edx*4-00000178h],eax
In edx, there is a "running" variable (0,1,2,3...). The lea,add and sub
will lead us to the position [ebp-78h+ edx ] ie the password starting
with a up to 1 (remember our notation of the password). Sar al,1 then
divides by 2 and stores that value at [ebp+edx*4-00000178h]. The fact
that edx is multiplied with 4 tells us, that we are dealing with an
integer array (integer-size: 4 bytes, while a char has 1 byte).
This one is also illustrated in main.c.
Code:
    for (i=0;i<strlen(password);i++)
        what_is_this[i]=password[i]/2;
An example of using that array, see crackit_7_analysis.asm):
Code:
; 18. Equation: e/2 = 27h

; What's this ? [ebp-00000168h] = e/2 !
        cmp dword ptr [ebp-00000168h],00000027h
. L00401740 - goodies

There is a rather strange part about q of the password. For comments
see crackit_7_analysis.asm.
Code:
; 19. Equation: quite a few tests for q
        movsx   eax,[ebp-000000E8h]
        mov [ebp-00000184h],eax
        cmp dword ptr [ebp-00000184h],0000000Fh
        jz  L00401709
        cmp dword ptr [ebp-00000184h],0000000Fh
        jg  L0040180E
        jmp L00401709
 L0040180E:
        cmp dword ptr [ebp-00000184h],00000061h
        jz  L0040181C
        jmp L00401709
I am not sure about that one, but it denies a couple of possibilities
for q. That's all.

. L0040181C - goodies

A nice one is: 20. Equation: 2*x - x/2 = 92h, again see crackit_7_analysis.asm
for comments.

And FINALLY, a last one which is in main.c:
Code:
// 21. Equation: y = (J%3)+63h
// The following part is calculating "i%3"
As a result, we have a couple of equations, namely 53, plusminus some
around that strange part about q in the 19. Equation. Since we have
(13+27=40) variables only, that's enough hopefully.


Solving equations


(file: crackit_7_analysis.asm)

The fourth step: We know the procedure to check for a valid login/password.
Unfortunately, this does not necessarily mean that we already determined
the login/password. Here, we will solve the (non-)linear equations for the
unknowns.


As an examples, take the equations with login-values only (16 in the
first part). We just know K = 6Bh, L = 65h (note: Eq.8 is obsolete -
it was not necessary to RE that part to solve for the login) and H = 72h.

Warning: We have to be careful with B and A:
B/2 = 30h and A/4 = 13h. B can be 60h and 61h, as well as A=4Ch, 4Dh, 4Eh and 4Fh.

Eq.10: We then solve for M=72h.
Eq.17: I=60h or 61h, as we already know from (Eq.3) (good crosscheck nevertheless).
Eq.15 tells us, that I%5=2, hence I=61h. And so B=61h.
Eq.14 relates A and I: A=4Dh.
Eq.5: J=63h.
Eq.6: G=43h
Eq.1: F=72h, since M%2=0 F=M.

Use the ascii table[13]:

ABCDEFGHIJKLM
Ma...rCracker


- trick: educated guess

Finalising the login: Either we continue with the following 37 equations
(or so) with the determination, or we try an educated guess: The unused
Eq.4 tells us: D-C has a distance of 1 or 0 (E%2=0 or 1 only).
We start: Maaa.rCracker - hmmm, not likely. Maab.rCracker, Mabb.rCracker, ...,
Mass.rCracker, Mast.rCracker, Matt.rCracker, aaahh wait, ...

MasterCracker! Yes, it must be that one!
Actually, staring long enough at "Ma...rCracker", you probably just can
see it.

A lot of the remaining unknowns are quite easy to determine. Just have
in mind, that in principle you cannot be completely sure, that your
educated guesses for C, D and E are correct.
Using Ollydbg [7] you can test your progress easily, and you will even
see, which equation is not satisfied doing a step-by-step debugging.


Conclusion


(file: crackit_7.exe, crackit_7.c)

Code:
c:\>crackit_7.exe
Crackit_7.exe written by Scorpius, 2004

Enter username: MasterCracker

Enter password: Sorry_but_this_one_is_also_homework:)

Congratz, you have done it!!!

Username: MasterCracker
Password: Sorry_but_this_one_is_also_homework:)

The last step in a RE process is to construct the c-code. The starting
point is given with main.c. In this example, to put the rest is not too
difficult - essentially a set of strlen, printf, gets, for and ifs -
so I leave this as an exercise (replace the asm-constructs with the simpler
c-equivalent!). However, with the kind permission of lepricaun, I also
attached his original c-file (crackit_7.c). You will see, that some constructs
are not so easy, as they look after RE. This is due to the compiler - the
compiler optimises certain things just by removing. For example x*2/2=3, ie
the compiler just translates x=3 (note: x/2*2 cannot be optimised in such a way
due to the ambiguity doing a /2 with integers!).



I hope, you enjoyed this little introduction to RE. If you want to improve
your own RE skills, I propose the following way:
Take our current work, add some new c-construct to it, compile, disassemble
and try to understand what the compiler did, and why the asm code indeed
reflects the additional c-construct. After a while, you recognise certain
pieces at a glance.


List of references


[ 1] http://www.white-scorpion.nl/crackits/crackits.html
[ 2] http://www.antionline.com/showthread...hreadid=262718
[ 3] http://en.wikipedia.org/wiki/Reverse_engineering
[ 4] http://www.copyright.gov/legislation/dmca.pdf
[ 5] http://anti-dmca.org
[ 6] http://www.heaventools.com/download.htm
[ 7] http://home.t-online.de/home/Ollydbg
[ 8] Windows calculator in scientific mode
[ 9] http://www.amazon.com/exec/obidos/AS...623155-2902231
[10] http://www.amazon.com/exec/obidos/tg...glance&s=books
[11] http://www.acm.uiuc.edu/sigmil/RevEng/ (book, not finished), http://www.backerstreet.com/rec/rec.htm (tries to create c code)
[12] http://www.hhdsoftware.com/hexeditor.html
[13] http://www.asciitable.com/