-
August 9th, 2003, 09:23 AM
#1
unassembly
I've been wanting to write this for a while but been intimidated
by the complexity of it. Not that this is a sophisticated project,
on the contrary, I will demo an unassembly of an extremely simple
msdos COM type executable. The problem is that, doing it by hand
at low level is still comprised of several steps.
Naturally, the unassembly, or reverse engineering, of modern
Windows executables would be done by much more sophisticated
software and methods, but you wouldn't get to learn the basics
if you just let some program do all the work.
Here we go. Lets say you have a short (hopefully) MS-DOS executable file you
want to unassemble. It must be the type with a COM extension. I happen to have one
created just for this demo. Naturally, you have to be working in DOS, so open up
a DOS window.
You load the file into the DEBUG program:
Code:
C:\junk>debug hello.com
-
All you see at this point is the infamous hyphen prompt.
DEBUG has the most terse and unforgiving syntax imaginable.
Read a good tutorial at http://www.datainstitute.com/debug1.htm to learn its commands.
Code:
-d
13B2:0100 B8 03 00 CD 10 BD 18 01-B8 00 13 BB 04 00 B9 0D ................
13B2:0110 00 BA 00 00 CD 10 CD 20-48 65 6C 6C 6F 2C 20 57 ....... Hello, W
13B2:0120 6F 72 6C 64 21 54 3D 24-70 24 67 00 77 69 6E 62 orld!T=$p$g.winb
13B2:0130 A3 1D D8 A3 26 D9 A3 33-DA A3 DB D6 A3 9D D5 A3 ....&..3........
13B2:0140 56 D4 48 A3 2A DA A2 35-D3 A2 2D DA A2 21 D9 E8 V.H.*..5..-..!..
13B2:0150 06 08 BE 0B 00 81 C6 C6-DB 8B 74 09 0B F6 75 03 ..........t...u.
13B2:0160 E9 13 01 B3 2B C6 06 FD-D7 01 33 ED BF 1D D8 89 ....+.....3.....
13B2:0170 36 EB D7 E8 FE 0C 9C FE-06 4B DB F6 C7 80 74 05 6........K....t.
-
This is the output from the d (dump) command. If you have ever used a hex
editor, you will immediately recognize the format. On the far left are
the memory addresses where the file is loaded. The second group of four
numbers, those 0100, 0110, etc. are more relevant to you than the others.
These are memory offsets corresponding to the order of the bytes in
the file.
0100 is the location in memory where the first byte of the file is loaded.
The next field, consisting of sixteen pairs of characters, is a HEX DUMP.
It is a listing of all of the bytes in the file, expressed in hexadecimal
numbers. See a tut at http://whatis.techtarget.com/definit...212247,00.html
if you need to know what they are.
So, the first byte in the file is B8, the second is 03, and so forth.
The field on the right is an ASCII dump. Any of the bytes that correspond
to normal characters will show here. Look, the file has the words "Hello, World!"
in it! Actually, this dump has a lot of extraneous stuff in it because DEBUG
read beyond the file into memory beyond. Now I happen to know that this file is
only 37 bytes long, so I can tell DEBUG to only read that much and no more.
Code:
-d 100 l 25
13B2:0100 B8 03 00 CD 10 BD 18 01-B8 00 13 BB 04 00 B9 0D ................
13B2:0110 00 BA 00 00 CD 10 CD 20-48 65 6C 6C 6F 2C 20 57 ....... Hello, W
13B2:0120 6F 72 6C 64 21 orld!
-
That's better. That command was d (dump) 100 (for the correct offset) l (as in larry,
and it means length) 25 (which is hex for 37, the length of the file).
Learn all this syntax, and your friends will think you are a hacker.
Now for the fun stuff.
Code:
-u 100 l 25
13B2:0100 B80300 MOV AX,0003
13B2:0103 CD10 INT 10
13B2:0105 BD1801 MOV BP,0118
13B2:0108 B80013 MOV AX,1300
13B2:010B BB0400 MOV BX,0004
13B2:010E B90D00 MOV CX,000D
13B2:0111 BA0000 MOV DX,0000
13B2:0114 CD10 INT 10
13B2:0116 CD20 INT 20
13B2:0118 48 DEC AX
13B2:0119 65 DB 65
13B2:011A 6C DB 6C
13B2:011B 6C DB 6C
13B2:011C 6F DB 6F
13B2:011D 2C20 SUB AL,20
13B2:011F 57 PUSH DI
13B2:0120 6F DB 6F
13B2:0121 726C JB 018F
13B2:0123 64 DB 64
13B2:0124 21543D AND [SI+3D],DX
-
This command, instead of dumping the file, uses the u (unassemble) command, and
shows the assembly language instructions that the bytes correspond to.
So, on the first line, the three bytes B8, 03, and 00, are interpreted by the
processor as a single instruction. On the right, the instructions are written out.
MOV AX,0003 is the first instruction in the file, and it means "move the
hex numerical value 0003 into the AX register". Check this good tut
http://telnet7.tripod.com/articles/8086_achitecture.htm for an
understanding of the intel processor's architecture and the meaning of
things like "registers". For our purposes, a register is a field inside
the processor where numbers are put and manipulated.
So the column with MOV, INT and so on, are processor instructions, and the
numbers to the right of them are the data being referenced by the instructions.
So what happened to the text? The file had the words "Hello, World!" in it,
but doesn't show up in this display. That's because, although DEBUG is quite
powerful, it isn't very intelligent, so our text is being interpreted
as though it,too, were instructions.
It just so happens that the last executable instruction in the file is at
offset 0116, and the bytes CD 20 are an INT 20 instruction, which terminates
the program, returning control to the operating system.
So, (I already knew this because I wrote the file), everything after that is our
text, and DEBUG's interpretation of those bytes as instructions can be ignored.
Code:
MOV AX,0003 ;set video mode 0003 (also clears screen)
INT 10 ;call BIOS video services
MOV BP,0118 ;address of a buffer (for our text string)
MOV AX,1300 ;display a string
MOV BX,0004 ;display attribute (color)
MOV CX,000D ;string length
MOV DX,0000 ;cursor location
INT 10 ;call Bios video services
INT 20 ;terminate program (and return to DOS)
So here I have edited out everything but the instructions, and added comments.
We are almost at the stage where we have source code we could use to recomplie
the program. (and that is the goal, to turn a program back into source code).
Now DEBUG has a primitive assembler built into it, so we will fashion our
source code to assemble with DEBUG.
Code:
N HELLO.COM
E 118 "Hello, World"
A 100
MOV AX,0003 ;set video mode 0003 (also clears screen)
INT 10 ;call BIOS video services
MOV BP,0118 ;address of a buffer (for our text string)
MOV AX,1300 ;display a string
MOV BX,0004 ;display attribute (color)
MOV CX,000D ;string length
MOV DX,0000 ;cursor location
INT 10 ;call Bios video services
INT 20 ;terminate program (and return to DOS)
RCX
25
W
Q
Make sure your source file has a carriage return after the last assembly
instruction. (before the RCX command) and a carriage return at the very
end of the file. Comments may be opposite the assembly instructions, set
off with a semicolon, but not on any other lines.
Now we type the following command:
Code:
C:\junk>debug<hello.txt
and DEBUG gives you the following screen output.
Code:
-N HELLO.COM
-E 118 "Hello, World"
-A 100
138E:0100 MOV AX,0003 ;set video mode 0003 (also clears screen)
138E:0103 INT 10 ;call BIOS video services
138E:0105 MOV BP,0118 ;address of a buffer (for our text string)
138E:0108 MOV AX,1300 ;display a string
138E:010B MOV BX,0004 ;display attribute (color)
138E:010E MOV CX,000D ;string length
138E:0111 MOV DX,0000 ;cursor location
138E:0114 INT 10 ;call Bios video services
138E:0116 INT 20 ;terminate program (and return to DOS)
138E:0118
-RCX
CX 0000
:25
-W
Writing 00025 bytes
-Q
C:\junk>
Check a directory listing.
Code:
C:\junk>dir hello.com
Volume in drive C is COMPAQ
Volume Serial Number is 39BD-2F92
Directory of C:\junk
HELLO COM 37 08/09/03 3:13a HELLO.COM
1 file(s) 37 bytes
0 dir(s) 11,114.50 MB free
C:\junk>
and a new file, HELLO.COM has been created. Now, try executing that file.
Code:
C:\junk>hello
Hello, World!
It runs!
I came in to the world with nothing. I still have most of it.
-
August 9th, 2003, 10:32 AM
#2
hey, cool tutorial,
out of interest, did you create the orignal .com file in assembly? and if so do the orignal source and the final unassembled source match up identically??
cheers
i2c
-
August 9th, 2003, 11:25 AM
#3
The original com file was assembled by DEBUG from source written in notepad, so
naturally it is completely reversible. Now, if you unassemble a program written
by someone else, and don't have access to their source code, and if the program
is a lot longer, then it is no longer a trivial task, but with patience you could arrive
at source code that could produce the program.
I came in to the world with nothing. I still have most of it.
-
August 9th, 2003, 12:52 PM
#4
Yea i thought thats what you would say, i was convinced that it was a fully reversible process if it was doen in the same assembler program,
when they code programs the other reason there is loads of extra code is because they put in loads of padding as some form of copy protection, although not a very good one
-
August 9th, 2003, 03:32 PM
#5
Nice tutorial, more helpful than just showing some assembly commands, that's for sure. I was wondering if you could post the original .com (unless i'm blind and didnt' notice it), so that we could play with that and follow the demo threw on our own. I was also curious about the registers. Is AX always your display basically?? how do all the nX registers work, is DX always the cursor?
-
August 9th, 2003, 08:04 PM
#6
I came in to the world with nothing. I still have most of it.
-
August 9th, 2003, 10:19 PM
#7
Junior Member
Isn't it called to "disassemble"?
-
August 16th, 2003, 10:41 AM
#8
Junior Member
Here are some books I have on assembly:
-Assembly Language for Intel-Based Computers
*had to order this one because I think is a college textbook, 4ed. is out now, also comes with masm on CD (if you get it used like i did, hopefully you'll get that with it!)
-Art of Assembly Language Programming
*Lots of good information.
-Assembly Language Step-by-Step By
*uses NASM assembler, included with it. Won't make you and exceptional assembly programmer, but uses very good metaphors to help you pick up the concepts if you know pretty much nothing about hex/registers/segments,etc.
-Revolutionary Guide To Assembly Language
*Never even read more than a the first couple chapters before getting another book so can't comment on effectiveness, but the format was confusing to me.
When starting to learn assembly the problems I encountered included not being able to find an assembler, which I didn't know of debug at the time and no books included one (which is why I bought the Assembly Language for Intel-Based Computers). I also came into it with the preconception of programming in a 32-bit environment, but no books started on that, which is
understandable now. When they say it's hard to learn, it can be, but just getting past the intial blocks, like trying to figure out what in the world the authors are talking about, it gets better. Along the way you'll learn about computer architecture more in depth too!
-
August 17th, 2003, 09:02 AM
#9
Junior Member
Forgot to leave the comment about the tutorial...I was born with ROM, ha ha, n/m. ok, back to the point. It used debug, which was good cause nothing to download (excluding your COM file if chosen). Also, glad you went over the implementation of viewable character data and executable data looking the same in memory. I'm still new to assembly, so the reinforcement of that last statement was helpful cause I never paid attention before when trying to dissect something (i.e. dumping then unassembling, which, by the way, i have heard as disassembling too. But, since the command on debug is u for unassembly (i think), it seems relevant to the subject.
If you can\'t explain something to a six-year-old, you really don\'t understand it yourself - Albert Einstein
If life is supposed to be a gift, how come I have to give it back?
If I die and get frozen, and am brought back later in the future, will the life insurance companies want their money back from the beneficiaries?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|