Monday, October 09, 2006

Beginner's guide to Stack buffer overflow...

If you are a C/C++ geek with an ounce of interest in system programming, you would have definitely tried stack buffer overflow. Most of the websites out there are either too detailed or too abstract. Some of the popular websites for buffer overflow claim success on age old machines.. of course the techniques listed on these sites don't work and are terribly hard to replicate.

So all you linux newbs, here is a simplistic buffer overflow exploit written in C... Well I wont call it an exploit, more of a way to modify the return address. A little bit of assembly knowledge would help but is not necessary. I did it on Ubuntu Dapper... fasten your seat belts now.

void function(int a, int b, int c)
{
char ret5[1];
}

int main()
{
int x;

x = 0;
function(1,100,3);
x = 1;
printf(``%d\n'',x);
}
*Adapted example from http://www.cs.wright.edu/people/faculty/tkprasad/courses/cs781/alephOne.html
The above program just prints 1 on the console... what did you think? BTW do keep the debugging option on while compiling the code, i.e your command line should be:
$gcc -o program program.c -g


Now its time to wear that black hat and fire gdb.

(gdb) break 1
Breakpoint 1 at 0x8048360: file temp.c, line 1.
(gdb) r
Starting program: /home/sridhar/bufov/program

Breakpoint 1, function (a=1, b=-1082010236, c=-1082010228) at temp.c:1
1 void function(int a, int b, int c) {
(gdb) s
3 }
(gdb) info registers
eax 0x10 16
ecx 0xbf81d58c -1082010228
edx 0x1 1
ebx 0xb7ef2adc -1209062692
esp 0xbf81d4a8 0xbf81d4a8
ebp 0xbf81d4b8 0xbf81d4b8
esi 0xbf81d584 -1082010236
edi 0xbf81d510 -1082010352
eip 0x8048366 0x8048366
eflags 0x200282 2097794
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb) print &ret
Hmm so what is the address of ret?
$1 = (char (*)[1]) 0xbf81d4b7
(gdb) disassemble main //Lets see the return address in main
Dump of assembler code for function main:
0x08048368 : push %ebp
0x08048369 : mov %esp,%ebp
0x0804836b : sub $0x28,%esp
0x0804836e : and $0xfffffff0,%esp
0x08048371 : mov $0x0,%eax
0x08048376 : add $0xf,%eax
0x08048379 : add $0xf,%eax
0x0804837c : shr $0x4,%eax
0x0804837f : shl $0x4,%eax
0x08048382 : sub %eax,%esp
0x08048384 : movl $0x0,0xfffffffc(%ebp)
0x0804838b : movl $0x3,0x8(%esp)
0x08048393 : movl $0x64,0x4(%esp)
0x0804839b : movl $0x1,(%esp)
0x080483a2 : call 0x8048360

|----this is the return address. How did I know that? well its the statement
after the function call

0x080483a7 : movl $0x1,0xfffffffc(%ebp)
0x080483ae : mov 0xfffffffc(%ebp),%eax
0x080483b1 : mov %eax,0x4(%esp)
0x080483b5 : movl $0x80484b4,(%esp)
0x080483bc : call 0x80482b0
0x080483c1 : leave
0x080483c2 : ret
End of assembler dump.
(gdb) x 0xbf81d4b8 //Go back up and see the value of ebp...
what's it pointing to? Notice its just below ret

0xbf81d4b8: 0xbf81d4f8
(gdb) x 0xbf81d4b9 //hmm the return address should be some where nearby
0xbf81d4b9: 0xa7bf81d4
(gdb) x 0xbf81d4ba // nah.. this is not the one
0xbf81d4ba: 0x83a7bf81
(gdb) x 0xbf81d4bb //still not there
0xbf81d4bb: 0x0483a7bf
(gdb) x 0xbf81d4bc //BINGO!!
0xbf81d4bc: 0x080483a7
(gdb) print &ret[4] //Now lets find out how far is ret away from the return address
$2 = 0xbf81d4bb "��\203\004\b\001"
(gdb) print &ret[5] //GOT IT
$3 = 0xbf81d4bc "�\203\004\b\001"
Now that we know that ret[5] contains the return address, lets go for the kill. A brute force way would have been to just fill ret with long strings so that the buffer overflows. If we know the the position of a code in the memory we can overwrite the return address to branch to that address instead of back to main. For the sake of simplicity, I'll just skip a statement in main(), so that the output is 0 instead of 1 (i.e.the statement x=1 is skipped).

From the disassembly of main() we know that the return address should be 0x080483ae instead of 0x080483a7. Which means i need to increment the return address by 0x080483ae-0x080483a7=7.

Lets take a look at the code now..

void function(int a, int b, int c)
{
char ret[1];
*(long *) &ret[5] +=7 ;
}
int main()
{
int x;
x = 0;
function(1,100,3);
x = 1;
printf("%d\n",x);
}


*Adapted example from http://www.cs.wright.edu/people/faculty/tkprasad/courses/cs781/alephOne.html
Ok WTF is *(long *) &ret[5] +=7 ??
Well it turns out that data is stored on word boundaries for efficiency, and we know that word is of the size of long. Hence the above statement dereferences the long data buffer pointed to by a char pointer. Wait for some time till that concept sinks in...
Feeling better now??good..
Now compile it $gcc -o p2 p2.c
and run it
$./p2
0
If you followed everything till this point, you are no longer a newb...

Update: I reffered this article.

2 comments:

  1. interesting !! :) nice to see that you learned it doing your ms ... i learned it working my *** off ....

    ReplyDelete
  2. haha... not from my academics dude... i too learned whilst working my *** off (i work for a security guy...remember)

    ReplyDelete