Results 1 to 9 of 9

Thread: unassembly

  1. #1
    AO Curmudgeon rcgreen's Avatar
    Join Date
    Nov 2001
    Posts
    2,716

    unassembly

    I've been wanting to write this for a while but been intimidated
    by the complexity of it. Not that this is a sophisticated project,
    on the contrary, I will demo an unassembly of an extremely simple
    msdos COM type executable. The problem is that, doing it by hand
    at low level is still comprised of several steps.

    Naturally, the unassembly, or reverse engineering, of modern
    Windows executables would be done by much more sophisticated
    software and methods, but you wouldn't get to learn the basics
    if you just let some program do all the work.

    Here we go. Lets say you have a short (hopefully) MS-DOS executable file you
    want to unassemble. It must be the type with a COM extension. I happen to have one
    created just for this demo. Naturally, you have to be working in DOS, so open up
    a DOS window.

    You load the file into the DEBUG program:

    Code:
    C:\junk>debug hello.com
    -
    All you see at this point is the infamous hyphen prompt.
    DEBUG has the most terse and unforgiving syntax imaginable.
    Read a good tutorial at http://www.datainstitute.com/debug1.htm to learn its commands.

    Code:
    -d
    13B2:0100  B8 03 00 CD 10 BD 18 01-B8 00 13 BB 04 00 B9 0D   ................
    13B2:0110  00 BA 00 00 CD 10 CD 20-48 65 6C 6C 6F 2C 20 57   ....... Hello, W
    13B2:0120  6F 72 6C 64 21 54 3D 24-70 24 67 00 77 69 6E 62   orld!T=$p$g.winb
    13B2:0130  A3 1D D8 A3 26 D9 A3 33-DA A3 DB D6 A3 9D D5 A3   ....&..3........
    13B2:0140  56 D4 48 A3 2A DA A2 35-D3 A2 2D DA A2 21 D9 E8   V.H.*..5..-..!..
    13B2:0150  06 08 BE 0B 00 81 C6 C6-DB 8B 74 09 0B F6 75 03   ..........t...u.
    13B2:0160  E9 13 01 B3 2B C6 06 FD-D7 01 33 ED BF 1D D8 89   ....+.....3.....
    13B2:0170  36 EB D7 E8 FE 0C 9C FE-06 4B DB F6 C7 80 74 05   6........K....t.
    -
    This is the output from the d (dump) command. If you have ever used a hex
    editor, you will immediately recognize the format. On the far left are
    the memory addresses where the file is loaded. The second group of four
    numbers, those 0100, 0110, etc. are more relevant to you than the others.
    These are memory offsets corresponding to the order of the bytes in
    the file.

    0100 is the location in memory where the first byte of the file is loaded.
    The next field, consisting of sixteen pairs of characters, is a HEX DUMP.
    It is a listing of all of the bytes in the file, expressed in hexadecimal
    numbers. See a tut at http://whatis.techtarget.com/definit...212247,00.html
    if you need to know what they are.

    So, the first byte in the file is B8, the second is 03, and so forth.

    The field on the right is an ASCII dump. Any of the bytes that correspond
    to normal characters will show here. Look, the file has the words "Hello, World!"
    in it! Actually, this dump has a lot of extraneous stuff in it because DEBUG
    read beyond the file into memory beyond. Now I happen to know that this file is
    only 37 bytes long, so I can tell DEBUG to only read that much and no more.

    Code:
    -d 100 l 25
    13B2:0100  B8 03 00 CD 10 BD 18 01-B8 00 13 BB 04 00 B9 0D   ................
    13B2:0110  00 BA 00 00 CD 10 CD 20-48 65 6C 6C 6F 2C 20 57   ....... Hello, W
    13B2:0120  6F 72 6C 64 21                                    orld!
    -
    That's better. That command was d (dump) 100 (for the correct offset) l (as in larry,
    and it means length) 25 (which is hex for 37, the length of the file).
    Learn all this syntax, and your friends will think you are a hacker.

    Now for the fun stuff.

    Code:
    -u 100 l 25
    13B2:0100 B80300        MOV     AX,0003
    13B2:0103 CD10          INT     10
    13B2:0105 BD1801        MOV     BP,0118
    13B2:0108 B80013        MOV     AX,1300
    13B2:010B BB0400        MOV     BX,0004
    13B2:010E B90D00        MOV     CX,000D
    13B2:0111 BA0000        MOV     DX,0000
    13B2:0114 CD10          INT     10
    13B2:0116 CD20          INT     20
    13B2:0118 48            DEC     AX
    13B2:0119 65            DB      65
    13B2:011A 6C            DB      6C
    13B2:011B 6C            DB      6C
    13B2:011C 6F            DB      6F
    13B2:011D 2C20          SUB     AL,20
    13B2:011F 57            PUSH    DI
    13B2:0120 6F            DB      6F
    13B2:0121 726C          JB      018F
    13B2:0123 64            DB      64
    13B2:0124 21543D        AND     [SI+3D],DX
    -
    This command, instead of dumping the file, uses the u (unassemble) command, and
    shows the assembly language instructions that the bytes correspond to.
    So, on the first line, the three bytes B8, 03, and 00, are interpreted by the
    processor as a single instruction. On the right, the instructions are written out.

    MOV AX,0003 is the first instruction in the file, and it means "move the
    hex numerical value 0003 into the AX register". Check this good tut
    http://telnet7.tripod.com/articles/8086_achitecture.htm for an
    understanding of the intel processor's architecture and the meaning of
    things like "registers". For our purposes, a register is a field inside
    the processor where numbers are put and manipulated.

    So the column with MOV, INT and so on, are processor instructions, and the
    numbers to the right of them are the data being referenced by the instructions.

    So what happened to the text? The file had the words "Hello, World!" in it,
    but doesn't show up in this display. That's because, although DEBUG is quite
    powerful, it isn't very intelligent, so our text is being interpreted
    as though it,too, were instructions.

    It just so happens that the last executable instruction in the file is at
    offset 0116, and the bytes CD 20 are an INT 20 instruction, which terminates
    the program, returning control to the operating system.
    So, (I already knew this because I wrote the file), everything after that is our
    text, and DEBUG's interpretation of those bytes as instructions can be ignored.


    Code:
    MOV     AX,0003        ;set video mode 0003 (also clears screen)
    INT     10             ;call BIOS video services 
    MOV     BP,0118        ;address of a buffer (for our text string)
    MOV     AX,1300        ;display a string
    MOV     BX,0004        ;display attribute (color)
    MOV     CX,000D        ;string length
    MOV     DX,0000        ;cursor location
    INT     10             ;call Bios video services
    INT     20             ;terminate program (and return to DOS)

    So here I have edited out everything but the instructions, and added comments.
    We are almost at the stage where we have source code we could use to recomplie
    the program. (and that is the goal, to turn a program back into source code).

    Now DEBUG has a primitive assembler built into it, so we will fashion our
    source code to assemble with DEBUG.

    Code:
    N HELLO.COM
    E 118 "Hello, World"
    A 100
    MOV     AX,0003        ;set video mode 0003 (also clears screen)
    INT     10             ;call BIOS video services 
    MOV     BP,0118        ;address of a buffer (for our text string)
    MOV     AX,1300        ;display a string
    MOV     BX,0004        ;display attribute (color)
    MOV     CX,000D        ;string length
    MOV     DX,0000        ;cursor location
    INT     10             ;call Bios video services
    INT     20             ;terminate program (and return to DOS)
    
    RCX
    25
    W
    Q
    Make sure your source file has a carriage return after the last assembly
    instruction. (before the RCX command) and a carriage return at the very
    end of the file. Comments may be opposite the assembly instructions, set
    off with a semicolon, but not on any other lines.

    Now we type the following command:

    Code:
    C:\junk>debug<hello.txt
    and DEBUG gives you the following screen output.

    Code:
    -N HELLO.COM
    -E 118 "Hello, World"
    -A 100
    138E:0100 MOV     AX,0003        ;set video mode 0003 (also clears screen)
    138E:0103 INT     10             ;call BIOS video services
    138E:0105 MOV     BP,0118        ;address of a buffer (for our text string)
    138E:0108 MOV     AX,1300        ;display a string
    138E:010B MOV     BX,0004        ;display attribute (color)
    138E:010E MOV     CX,000D        ;string length
    138E:0111 MOV     DX,0000        ;cursor location
    138E:0114 INT     10             ;call Bios video services
    138E:0116 INT     20             ;terminate program (and return to DOS)
    138E:0118
    -RCX
    CX 0000
    :25
    -W
    Writing 00025 bytes
    -Q
    
    C:\junk>

    Check a directory listing.


    Code:
    C:\junk>dir hello.com
    
     Volume in drive C is COMPAQ
     Volume Serial Number is 39BD-2F92
     Directory of C:\junk
    
    HELLO    COM            37  08/09/03  3:13a HELLO.COM
             1 file(s)             37 bytes
             0 dir(s)       11,114.50 MB free
    
    C:\junk>

    and a new file, HELLO.COM has been created. Now, try executing that file.

    Code:
    C:\junk>hello
    Hello, World!

    It runs!
    I came in to the world with nothing. I still have most of it.

  2. #2
    Senior Member
    Join Date
    Jul 2003
    Posts
    634
    hey, cool tutorial,

    out of interest, did you create the orignal .com file in assembly? and if so do the orignal source and the final unassembled source match up identically??

    cheers

    i2c

  3. #3
    AO Curmudgeon rcgreen's Avatar
    Join Date
    Nov 2001
    Posts
    2,716
    The original com file was assembled by DEBUG from source written in notepad, so
    naturally it is completely reversible. Now, if you unassemble a program written
    by someone else, and don't have access to their source code, and if the program
    is a lot longer, then it is no longer a trivial task, but with patience you could arrive
    at source code that could produce the program.
    I came in to the world with nothing. I still have most of it.

  4. #4
    Senior Member
    Join Date
    Jul 2003
    Posts
    634
    Yea i thought thats what you would say, i was convinced that it was a fully reversible process if it was doen in the same assembler program,

    when they code programs the other reason there is loads of extra code is because they put in loads of padding as some form of copy protection, although not a very good one

  5. #5
    Senior Member
    Join Date
    Jan 2003
    Posts
    3,915
    Nice tutorial, more helpful than just showing some assembly commands, that's for sure. I was wondering if you could post the original .com (unless i'm blind and didnt' notice it), so that we could play with that and follow the demo threw on our own. I was also curious about the registers. Is AX always your display basically?? how do all the nX registers work, is DX always the cursor?

  6. #6
    AO Curmudgeon rcgreen's Avatar
    Join Date
    Nov 2001
    Posts
    2,716
    here's the original file
    I came in to the world with nothing. I still have most of it.

  7. #7
    Junior Member
    Join Date
    Aug 2003
    Posts
    10
    Isn't it called to "disassemble"?

  8. #8
    Here are some books I have on assembly:
    -Assembly Language for Intel-Based Computers
    *had to order this one because I think is a college textbook, 4ed. is out now, also comes with masm on CD (if you get it used like i did, hopefully you'll get that with it!)
    -Art of Assembly Language Programming
    *Lots of good information.
    -Assembly Language Step-by-Step By
    *uses NASM assembler, included with it. Won't make you and exceptional assembly programmer, but uses very good metaphors to help you pick up the concepts if you know pretty much nothing about hex/registers/segments,etc.
    -Revolutionary Guide To Assembly Language
    *Never even read more than a the first couple chapters before getting another book so can't comment on effectiveness, but the format was confusing to me.

    When starting to learn assembly the problems I encountered included not being able to find an assembler, which I didn't know of debug at the time and no books included one (which is why I bought the Assembly Language for Intel-Based Computers). I also came into it with the preconception of programming in a 32-bit environment, but no books started on that, which is
    understandable now. When they say it's hard to learn, it can be, but just getting past the intial blocks, like trying to figure out what in the world the authors are talking about, it gets better. Along the way you'll learn about computer architecture more in depth too!

  9. #9
    Forgot to leave the comment about the tutorial...I was born with ROM, ha ha, n/m. ok, back to the point. It used debug, which was good cause nothing to download (excluding your COM file if chosen). Also, glad you went over the implementation of viewable character data and executable data looking the same in memory. I'm still new to assembly, so the reinforcement of that last statement was helpful cause I never paid attention before when trying to dissect something (i.e. dumping then unassembling, which, by the way, i have heard as disassembling too. But, since the command on debug is u for unassembly (i think), it seems relevant to the subject.
    If you can\'t explain something to a six-year-old, you really don\'t understand it yourself - Albert Einstein
    If life is supposed to be a gift, how come I have to give it back?
    If I die and get frozen, and am brought back later in the future, will the life insurance companies want their money back from the beneficiaries?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •