Reverse Engineering (Tutorial)
Results 1 to 9 of 9

Thread: Reverse Engineering (Tutorial)

  1. #1
    Senior Member
    Join Date
    Mar 2004
    Posts
    557

    Reverse Engineering (Tutorial)

    Reverse Engineering (Tutorial)


    This tutorial deals with reverse engineering (RE).
    It is organised as follows: In the introduction, the scope of the tutorial
    is defined. After presenting the prerequisites, a systematic way to RE
    is described - based on an exemplary executable[1] (credits to lepricaun),
    that has been used among others in a thread about "cracking" here on AO[2].
    The goal in this tutorial is to "crack" that executable, and hence is more
    application-oriented than theoretical.


    Introduction


    RE is the process of taking something (a device, an electrical component,
    a software program, etc.) apart and analyzing its workings in detail,
    usually with the intention to construct a new device or program that
    does the same thing without actually copying anything from the original[3].
    Here, RE is introduced to craft code (in c) that is doing the same as the
    original program.

    RE can be used to perform cracking, which is defined as the "removal of
    copy protection". This is criminal under certain circumstances, at least
    in the US, due to the Digital Millennium Copyright Act (DMCA)[4,5]).

    The above mentioned executable runs on MS Windows - hence, in this tutorial
    we restrict ourselves to use MS Windows based applications/tools.

    Final notes: Neither am I a professional programmer nor have I studied
    IT. If I use terminology in the wrong context, please stay calm and correct
    me. And have in mind: Neither am I native English speaking, nor do I life
    in a country with English as its official language.


    Prerequisites


    The tools , we are going to use are
    . Microsoft Visual C++ 6.0

    With this program, we will create our new code. In addition, this environment
    has a nice disassembler, which helps us to check our effort easily. If
    not available I will mention an alternative way.

    . PE Explorer [6]

    PE Explorer will be our main tool to do the disassembly.

    . OllyDbg 1.10 [7]

    OllyDbg 1.10 is a very useful runtime debugger and disassembler.

    . The windows calculator [8]

    For calculations in the hexadecimal system and/or conversions to the
    decimal system.


    I cannot recommend books about RE (haven't read any), but these seem
    to be quite popular:
    . Hacker Disassembling Uncovered[9]
    . Assembly Language Step-by-step: Programming with DOS and Linux[10]
    . Online Resources[11]


    Having the above tools available, we are ready.


    Determination of the program structure


    (file: crackit_7.asm)

    The first step: We disassemble the executable using PE Explorer and look
    at it. Usually, not all program parts are transparent. A RE process involves
    a lot iterations of the whole systematic way presented in this tutorial.
    After one iteration, you understand the code a bit more, and in a second
    iteration you can correct errors and/or add new information to your notes.
    Be patient.



    - trick: Strings

    In the disassembler view, there is a tab "Strings". The crackme's 1,2
    and 3 can be solved immediately.

    . extract the user-crafted code

    (file: crackit_7_relevant.asm)

    PE Explorer delimits connected code segments with a delimiter ";------...".
    Check for
    Code:
            push    ebp
            mov ebp,esp
            pop ebp     ; eventually, but most likely
    since these commands initiate a function (like main()). There are quite
    a few, but check the following code for the coder's specific information,
    here like
    Code:
            jmp L00401B70
            Align   8
     SSZ00401290_R0m3oD3l7a:
            db  'R0m3oD3l7a',0
    Sometimes, this task is not so simple, since some compilers add a lot
    of stuff to it (like the Microsoft one). Two hints: 1) Make a runtime debugging
    using Ollydbg. Follow the code with F7 until you will see something that
    is specific for the coder. 2) Here, at some point, you have to enter a
    login ie there must be a function call to "gets", for example. Check the code
    for the first appereance of "gets" and travel back carefully.

    Follow the code until you stumble across the first
    Code:
            mov ebx,[ebp-04h] ; eventually, but most likely
            leave
            retn
    This ends the program.


    . scan for memory regions that are used (variables, constants)

    (file: crackit_7_ebp.asm)

    It is very helpful to know where in the memory the variables, constants
    are stored. The address of the relevant memory is in the basepointer ebp.
    In asm, you access it with [ebp]. Data is usually stored in addresses
    smaller than ebp, ie [ebp-04h] is typical for the first variable (in the
    very simple program below "i"). Test it: With Visual C++ in debug mode,
    compile the program
    Code:
    void main(){
    int i=1;
    }
    jump into the code and press: CTRL-F11. You will see something like
    Code:
    0040D6D8   mov         dword ptr [ebp-4],1
    which is i=1, as expected.
    In case you do not have Visual C++, compile with your favorite compiler
    and use PE Explorer or Ollydbg to look for the addresses.

    In our project, we have to deal with quite a lot of variables. We expect
    at least two sets: two char-arrays, one for the login, one for the password.
    We know, that we have to type in the login/password, so let's check for
    eg. "jmp_msvcrt.dll!gets".
    Code:
            call    jmp_msvcrt.dll!gets
            lea eax,[ebp-00000088h]
    Indeed: The login will be stored from [ebp-88h] up to [ebp-88h+length_of_loginstring-1].
    We further find the position of the password:
    Code:
            call    jmp_msvcrt.dll!gets
            lea eax,[ebp-000000F8h]
    Check for all [ebp] in the relevant code segment, however keep in mind
    that some are not important. Summary:


    [ebp-0Ch] 0 (so far)
    [ebp-10h] length_username (result of strlen is in EAX, check crackit_7_ebp.asm)
    [ebp-14h] length_passsword (result of strlen is in EAX, check crackit_7_ebp.asm)

    [ebp-88h] start of the login. The last one in that series around -88h is -7Ch.
    ... Let's introduce the notation: login=ABCDEFGHIJKLM, where A
    [ebp-7Ch] is a label of [ebp-88h], and M is a label of [ebp-7Ch]

    [ebp-F8h] F8h up to DEh is used - A 27-char password? (Is this guy crazy? ;D).
    ... It looks like. Let's introduce the notation: password=abcdefgh....xyz1,
    [ebp-DEh] where a is a label of [ebp-F8h], and 1 is a label of [ebp-DEh]

    [ebp-178h] There is another huge region. So far, it's content is unknown
    ...step 4... but you might have seen, some of it is used for login/password
    [ebp-110h] testing purposes. eg.
    Code:
    cmp dword ptr [ebp-00000168h],00000027h
    [ebp-00FCh] A pointer to a string "R0m3oD3l7a"

    Please be aware, the relevance of certain regions in the memory are not
    a priori clear. Just make a reasonable list, which you alter during the
    whole RE.

    . analyse the program structure

    L00401280: Starting point of "main()". Some "strings" are defined.

    SUB_L0040136D: Here, something happens. After reading in the login-string
    (and getting the length) a bunch of comparisons take place, e.g.
    Code:
            movzx   eax,[ebp-00000088h] ; A
            cmp al,[ebp-00000087h]  ; B
            jz  L0040155D           ; if (A==B) jump to L0040155D
    All in the same manner. What is in L0040155D?
    L0040155D: It reads in the password, but then jumps immediately to L00401709.
    L0040157C: This also reads in the password, and then determines the length.
    L004015AE: L004015C3: Seems kind of loop. Some copying or so?
    L004015E7: A lot of comparisons, which seems to involve the login and the password.
    L00401709: Output: a taunt! And then it jumps to L0040195F: The END!
    Aha, this explains L0040155D.
    L00401740: Jesus, another bunch of comparisons. Also with jumps to L00401709.
    L0040180E: ...whatever...
    L0040181C: And a final set of tests. Again with jumps to L00401709.
    But if we pass all these, we are done:
    Code:
            mov dword ptr [esp],SSZ00401330__Congratz__you_have_done_it____
            call    jmp_msvcrt.dll!printf
    L0040195F: This is the end, my only friend, the end ...

    - trick: cracking

    (file: crackit_7_cracked.exe)

    One sort of cracking would be to replace the above mentioned "jz L0040155D"
    at position 004013E3 with "jz to_the_Congratz__you_have_done_it____", which is
    at position 0040191D.

    jz L0040155D ; = 0F84 7401 0000 ; jumps to 0040155D if A=B.

    We want to jump to 0040191D. 0040191D - 004013E3 = 053A.
    jump command itself takes 6 byes: 053A - 6 = 0534.

    jnz 34050000 ; = 0F85 3405 0000 ; jumps to 0040191D if A != B (more possibilities)

    For the hex-edit, use for example [12].
    Of course, you can jump before the first comparison, such that not even
    the prompt for a login appears.


    Construction of an own framework


    (file: main.c)

    The second step: Using the accumulated understanding of the program,
    we try to construct our own framework using Microsoft Visual C++.
    What does that mean? We create a main.c-file that contains the same
    structure, ie variables, constants, function-availability (like strlen,
    printf, scanf), with all variables, constants at the same position in
    memory (relative to the ebp). Let's try the following:
    Code:
    #include <stdio.h>
    #include <string.h>
    
    void main(){
        int i=0;int j=0;int k=0;
        int length_username;
        int length_password;
    
        char dummystring0[99];
        char username[13];
        char dummystring1[83];
        char password[27];
        char dummystring2[20];
        int what_is_this[27];
    
        sprintf(username,"ABCDEFGHIJKLM");
        sprintf(password,"abcdefghijklmnopqrstuvwxyz1");
    
        length_username=strlen(username);
        length_password=strlen(password);
    
        what_is_this[0]=0;
        what_is_this[1]=1;
        ...
    With Visual C++ in debug mode, jump in the code and check with CTRL-F11
    which memory-positions are used for length_username (should be [ebp-10h]),
    ..., what_is_this[0] (should be [ebp-178h]). Adjust the size of the
    dummystrings correspondingly.

    With this setup, we can copy 1-1 the assembler code from our disassembled
    file into our main.c-file. This simplifies the following analysis.


    Analysis of code segments


    (file: crackit_7_analysis.asm, main.c)

    The third step: Now, we start to work. Each code segment has to be looked
    at in detail. Some are simple (in crackit_7_analysis.asm: Equation 1)
    and can be RE directly, namely "Equation 1: A!=B" (remember our notation).
    others are more tricky (in crackit_7_analysis.asm: Equation 2). While
    the experienced programmer recognizes "Equation 2: F=M+M%2", a beginner
    does not know it for sure. How to get it?


    - trick: add asm in c (continuation of the above main.c)
    Code:
    // 2. Equation: F = M + M%2
    // The following part is calculating "i+i%2"
        for (i=0;i<127;i++){
            __asm
            {
                mov dl, [ebp-04h]   ; copy the value of "i" into dl
    
    ; copy the asm-code to look at in here:
    
            mov al,dl
            sar al,07h
            shr al,07h
            add al,dl
            sar al,1
            add al,al
            sub dl,al
            mov al,dl
            movsx   edx,al
        ;   movsx   eax,[ebp-7Ch]   ; this has to be substituted
            movsx   eax,[ebp-04h]   ; this is the only substitution
            lea eax,[eax+edx]       ; the result is in eax
    
    
                mov  [ebp-08h],al   ; copy the result in al into "j"
    
            } // asm
            printf("%d:%d\t",i,j);  // print: initial value M -> function(M)
        }  // for i
    Look at the output carefully. You can deduce i+i%2, eg.
    i=6: i%2=0: j=i+i%2 = i = 6
    i=7: i%2=1: j=i+i%2 = i +1 = 8
    i=8: i%2=1: j=i+i%2 = i = 8

    There is another example in main.c:
    Code:
    // 16. Equation: I%5 = 2
    // The following part is calculating "i%5"
    Since the crazy programmer has chosen a login of length 13 and a password
    of length 27 (!), a lot of work is ahead of us. However, in this section
    I will present all extraordinary constructions.

    . the ominous [ebp-000000FCh]

    Code:
    ; 8. Equation: L - 5Bh = strlen("R0m3oD3l7a") = Ah
            movsx   eax,[ebp-7Dh]
            lea ebx,[eax-5Bh]
            mov eax,[ebp-000000FCh] ; prepare for strlen
            mov [esp],eax
            call    jmp_msvcrt.dll!strlen
            cmp ebx,eax         ; result of strlen is in eax!
            jnz L0040155D
    We put L (remember our notation for the login) into eax, and subtract
    5Bh: ebx = L-5Bh. ebx is then compared with the length of that string
    which address is stored at [ebp-000000FCh]: The length of "R0m3oD3l7a"
    is 10 = Ah. Hence, the 8. Equation is L - 5Bh = Ah

    . "[ebp-178h]" and the loop on L004015AE: L004015C3:
    The meaning of the "[ebp-178h]"-array.

    Code:
            mov edx,[ebp-0Ch]
            lea eax,[ebp-08h]
            add eax,[ebp-0Ch]
            sub eax,000000F0h
            movzx   eax,[eax]
            sar al,1
            movsx   eax,al
            mov [ebp+edx*4-00000178h],eax
    In edx, there is a "running" variable (0,1,2,3...). The lea,add and sub
    will lead us to the position [ebp-78h+ edx ] ie the password starting
    with a up to 1 (remember our notation of the password). Sar al,1 then
    divides by 2 and stores that value at [ebp+edx*4-00000178h]. The fact
    that edx is multiplied with 4 tells us, that we are dealing with an
    integer array (integer-size: 4 bytes, while a char has 1 byte).
    This one is also illustrated in main.c.
    Code:
        for (i=0;i<strlen(password);i++)
            what_is_this[i]=password[i]/2;
    An example of using that array, see crackit_7_analysis.asm):
    Code:
    ; 18. Equation: e/2 = 27h
    
    ; What's this ? [ebp-00000168h] = e/2 !
            cmp dword ptr [ebp-00000168h],00000027h
    . L00401740 - goodies

    There is a rather strange part about q of the password. For comments
    see crackit_7_analysis.asm.
    Code:
    ; 19. Equation: quite a few tests for q
            movsx   eax,[ebp-000000E8h]
            mov [ebp-00000184h],eax
            cmp dword ptr [ebp-00000184h],0000000Fh
            jz  L00401709
            cmp dword ptr [ebp-00000184h],0000000Fh
            jg  L0040180E
            jmp L00401709
     L0040180E:
            cmp dword ptr [ebp-00000184h],00000061h
            jz  L0040181C
            jmp L00401709
    I am not sure about that one, but it denies a couple of possibilities
    for q. That's all.

    . L0040181C - goodies

    A nice one is: 20. Equation: 2*x - x/2 = 92h, again see crackit_7_analysis.asm
    for comments.

    And FINALLY, a last one which is in main.c:
    Code:
    // 21. Equation: y = (J%3)+63h
    // The following part is calculating "i%3"
    As a result, we have a couple of equations, namely 53, plusminus some
    around that strange part about q in the 19. Equation. Since we have
    (13+27=40) variables only, that's enough hopefully.


    Solving equations


    (file: crackit_7_analysis.asm)

    The fourth step: We know the procedure to check for a valid login/password.
    Unfortunately, this does not necessarily mean that we already determined
    the login/password. Here, we will solve the (non-)linear equations for the
    unknowns.


    As an examples, take the equations with login-values only (16 in the
    first part). We just know K = 6Bh, L = 65h (note: Eq.8 is obsolete -
    it was not necessary to RE that part to solve for the login) and H = 72h.

    Warning: We have to be careful with B and A:
    B/2 = 30h and A/4 = 13h. B can be 60h and 61h, as well as A=4Ch, 4Dh, 4Eh and 4Fh.

    Eq.10: We then solve for M=72h.
    Eq.17: I=60h or 61h, as we already know from (Eq.3) (good crosscheck nevertheless).
    Eq.15 tells us, that I%5=2, hence I=61h. And so B=61h.
    Eq.14 relates A and I: A=4Dh.
    Eq.5: J=63h.
    Eq.6: G=43h
    Eq.1: F=72h, since M%2=0 F=M.

    Use the ascii table[13]:

    ABCDEFGHIJKLM
    Ma...rCracker


    - trick: educated guess

    Finalising the login: Either we continue with the following 37 equations
    (or so) with the determination, or we try an educated guess: The unused
    Eq.4 tells us: D-C has a distance of 1 or 0 (E%2=0 or 1 only).
    We start: Maaa.rCracker - hmmm, not likely. Maab.rCracker, Mabb.rCracker, ...,
    Mass.rCracker, Mast.rCracker, Matt.rCracker, aaahh wait, ...

    MasterCracker! Yes, it must be that one!
    Actually, staring long enough at "Ma...rCracker", you probably just can
    see it.

    A lot of the remaining unknowns are quite easy to determine. Just have
    in mind, that in principle you cannot be completely sure, that your
    educated guesses for C, D and E are correct.
    Using Ollydbg [7] you can test your progress easily, and you will even
    see, which equation is not satisfied doing a step-by-step debugging.


    Conclusion


    (file: crackit_7.exe, crackit_7.c)

    Code:
    c:\>crackit_7.exe
    Crackit_7.exe written by Scorpius, 2004
    
    Enter username: MasterCracker
    
    Enter password: Sorry_but_this_one_is_also_homework:)
    
    Congratz, you have done it!!!
    
    Username: MasterCracker
    Password: Sorry_but_this_one_is_also_homework:)

    The last step in a RE process is to construct the c-code. The starting
    point is given with main.c. In this example, to put the rest is not too
    difficult - essentially a set of strlen, printf, gets, for and ifs -
    so I leave this as an exercise (replace the asm-constructs with the simpler
    c-equivalent!). However, with the kind permission of lepricaun, I also
    attached his original c-file (crackit_7.c). You will see, that some constructs
    are not so easy, as they look after RE. This is due to the compiler - the
    compiler optimises certain things just by removing. For example x*2/2=3, ie
    the compiler just translates x=3 (note: x/2*2 cannot be optimised in such a way
    due to the ambiguity doing a /2 with integers!).



    I hope, you enjoyed this little introduction to RE. If you want to improve
    your own RE skills, I propose the following way:
    Take our current work, add some new c-construct to it, compile, disassemble
    and try to understand what the compiler did, and why the asm code indeed
    reflects the additional c-construct. After a while, you recognise certain
    pieces at a glance.


    List of references


    [ 1] http://www.white-scorpion.nl/crackits/crackits.html
    [ 2] http://www.antionline.com/showthread...hreadid=262718
    [ 3] http://en.wikipedia.org/wiki/Reverse_engineering
    [ 4] http://www.copyright.gov/legislation/dmca.pdf
    [ 5] http://anti-dmca.org
    [ 6] http://www.heaventools.com/download.htm
    [ 7] http://home.t-online.de/home/Ollydbg
    [ 8] Windows calculator in scientific mode
    [ 9] http://www.amazon.com/exec/obidos/AS...623155-2902231
    [10] http://www.amazon.com/exec/obidos/tg...glance&s=books
    [11] http://www.acm.uiuc.edu/sigmil/RevEng/ (book, not finished), http://www.backerstreet.com/rec/rec.htm (tries to create c code)
    [12] http://www.hhdsoftware.com/hexeditor.html
    [13] http://www.asciitable.com/
    If the only tool you have is a hammer, you tend to see every problem as a nail.
    (Abraham Maslow, Psychologist, 1908-70)

  2. #2
    Nice, very nice! but perhaps you should have made it a little easier to understand , i've written the program, and even i only understand about 25% of what you are talking about

    - trick: Strings

    In the disassembler view, there is a tab "Strings". The crackme's 1,2
    and 3 can be solved immediately.
    for other people reading this: sec_ware is refering to the other crackits at my site when talking about the first 3.


    Well sec_ware, you've done a GREAT job, wish i could give you more greenies, but i have to assign AP's to others first

  3. #3
    Senior Member
    Join Date
    Mar 2004
    Posts
    557
    Hi lepricaun,

    thanks for the comments! And don't worry about the greenies - do words not count
    as much, if not even more?
    I was aware, that the level of that tutorial was somewhat high, since it requires not only
    knowledge about asm, some math-skills, but also about bit-manipulations in detail, like

    note: x/2*2 cannot be optimised in such a way
    due to the ambiguity doing a /2 with integers!
    though, I have to admit, that one, one could write in simpler words ....

    It's not an easy-over-a-cup-of-tea-reader. If you have any specific question,
    I gladly try to answer it!

    Cheers

  4. #4
    well, i wasn't really talking about the code, but about your comments

    well, as soon as i have the time to learn it, i will centainly go through your tutorial one line at the time, but unfortunately that will only happen in the weekends..

    but i'll let you know where i get stuck in your tut

  5. #5
    Computer Forensics
    Join Date
    Jul 2001
    Posts
    672
    Holy crap. This is possibly the best tutorial I have ever seen on antionline.

    Lepricaun: great work on the executables..they were a pleasant distraction at work. Keep posting as you learn, it makes this place interesting.

    sec_ware: excellent information. Out of curiosity how long did it take you to reverse #7?
    Antionline in a nutshell
    \"You\'re putting the fate of the world in the hands of a bunch of idiots I wouldn\'t trust with a potato gun\"

    Trust your Technolust

  6. #6
    Senior Member
    Join Date
    Mar 2004
    Posts
    557
    Thank you guys for the nice feedback I got for this tutorial.
    It makes me feel "Hey, it was worth the time I spent" -
    I guess, lepricaun has the same impression with his executables

    hogfly, to crack #7 took me a whole evening (starting at 8pm, ending ... ),
    to write the tutorial took about the same time (I already had quite a few notes and
    comments from my cracking evening - albeit not as systematic as it is presented
    in the tutorial ... )!
    If the only tool you have is a hammer, you tend to see every problem as a nail.
    (Abraham Maslow, Psychologist, 1908-70)

  7. #7
    sec_ware --> yes i feel the same way

    hogfly --> thanks for the compliment, i will surely post some more as i progress in knowledge..
    b.t.w, i see your a computer forensic, can you point me to some tutorials / tools of which can help me learn more about forensics?

  8. #8
    Computer Forensics
    Join Date
    Jul 2001
    Posts
    672
    Lepricaun: Check out the forensics forum, it's got a fair amount of information gathering there..and it's only getting better.
    Antionline in a nutshell
    \"You\'re putting the fate of the world in the hands of a bunch of idiots I wouldn\'t trust with a potato gun\"

    Trust your Technolust

  9. #9
    yes i know, i visit it from time to time, and i'm also a member of the forensics@bugtraq.com mailing list, but still haven't found some really helpful tutorials, only some tools which cost a LOT of money (Encase) etc.
    i also have an evaluation version of forensic toolkit, and i like the tool so i think i will buy a copy of it

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •