[4] Compiler Optimizations

Published: Monday, May 1, 2017 | 8 minute read | Using 1615 words

By Niles

1 Compiler Optimizations

The general idea of optimization is to replace code with more efficient code that has the same end result.

The problem with this is that the ANSI C standard specifies that the results of the new code have to match the original code in an “abstract machine”. In an abstract machine, undefined behavior can never happened.

Undefined behavior in the C standard consists of a “shall” or “shall not” from the standard being violated, anything listed in the standard as “undefined behavior” or anything not listed in the standard at all.

The general problem is that sometimes programmers rely on checking for undefined behavior to make sure it has not happened. The compiler assumes that this undefined behavior can never happened and simply “optimize” away the programmer’s code.

2 Algebraic Simplification

Some example C code might look like this

char *ptr; // ptr to start of array
char *max; // ptr to end of array
size_t len;
// Other code
if (ptr + len > max)
        return EINVAL;

This makes sense assuming normal values of len, however if len is generated incorrectly or supplied by the user, the value of ptr + len may overflow, causing ptr + len to be smaller than max and allowing whoever controls len to access arbitrary addresses in memory beyond the end of the array.

The common resolution to this is to write something like this:

if (ptr + len < ptr || ptr + len > max)
        return EINVAL;

Unfortunately, in this instance the compiler can apply algebraic simplification. When comparing P + V1 and P + V2 where P is the same pointer and V1 and V2 are the same integer type, the compiler optimizes it down to a comparison between V1 and V2. What this means for our program is that our check, ptr + len < ptr could be rewritten as ptr + len < ptr + 0, and that can be algebraically simplified to len < 0, which is impossible because len is unsigned.

This means that to our compiler, ptr + len < ptr is dead code because it will always return false, and gcc 4.3 with -O2 will “optimize” it away.

The only way to mitigate this is to change the check to something like this

if (len > max - ptr)
        return EINVAL;

3 Algebraic Simplification, cont.

Another example of algebraic simplification changing the result of code might look like the following

int i = (x * 1000) / 2000

A compiler would usually optimize this to i = x / 2, however if x * 1000 were going to overflow, this does not do the same thing. For example, if x were set to 1073742, then the non-optimized program would set i to -2147483, however the optimized code would return the mathematically correct value of 2147484.

4 Integer Overflow

Consider the code

int f() {
        int i;
        int j = 0;
        for (i = 1; i > 0; i += i)
            ++j;
            return j;
    }
}

Eventually the loop will end because i will overflow, however as the compiler says integer overflow is undefined behavior and can therefore never happen, gcc 4.3.2 with -O2 “optimizes” this code to an infinite loop.

This type of check even exists in the GNU C Library, where the implementation of mktime will always return as if a time adjustment was successful, even though the program has checks built in to return -1 if it were to fail due to integer overflow.

5 Loop Hoisting

The compiler is allowed to pull partial (or even sometimes full) statements from inside a loop to outside the loop. This is useful if you are, for example, adding a complex numerical expression to a final result, but the variables in the expression are not modified in the loop. In that case, something like

for (int i = 0; i < 10; i++) {
        total += a*x^15 % i;
}

may be optimized into the much faster code

int temp = a*x^15;
for (int i = 0; i < 10; i++) {
        total += temp % i;
}

The unfortunate side effect of this is that it’s not just pulling constant expressions outside of loops, it can also rearrange the order of statements inside a loop for optimal efficiency. For example, in the following code

signed int si1 = atoi(argv[1]);
signed int si2 = atoi(argv[2]);
signed int result = 8;
size_t i;
puts("log message one.\n");
for (i = 0; i < MAX; ++i) {
        puts("log message two.\n");
        if (argc == 8) i++;
    result += i + si1 % si2;
    puts("log message three.\n");
}
printf("Result = %d.\n", result);

In this case, the program should print “log message one.”, then “log message two.”, then fail due to a floating point exception. The optimized program, however, only prints the first message before failing, leading to some serious head-scratching debugging hell.

6 Clearing Sensitive Memory

By far the largest optimization a compiler can make is dead code removal. For example, a malloc call that creates an area of memory that is then never used can be removed from the program before it is compiled without any resulting changes. However, this is not always the best option for security reasons. For example, if the user types a password into the program, it needs to be stored in memory. After the password is done being used, it should be overwritten with either null bytes or random data so that when the memory is reallocated to another process later, the second process cannot see the user’s password. An example is shown in the following code

void getPassword(void) {
        char pwd[64];
        if (GetPassword(pwd, sizeof(pwd))) {
                /* check password */
        }
        memset(pwd, 0, sizeof(pwd));
}

However, if this memory is never used after the call to memset, the compiler can just regard the memset instruction as dead code and elade it. This is an obvious security flaw, but there are several ways to mitigate it. The first to be implemented was the Microsoft Visual C function ZeroMemory().

However, it sometimes gets optimized out as well, so they added a new function, SecureZeroMemory(), which the compiler is not allowed to optimize out.

Another solution is to surround the code with #pragma directives, like this

void getPassword(void) {
        char pwd[64];
        if (GetPassword(pwd, sizeof(pwd))) {
                /* check password */
        }
#pragma optimize("", off)
        memset(pwd, 0, sizeof(pwd));
#pragma optimize("", on)
}

The pragma directive is supported on some versions of Microsoft Visual Studio and may be supported on other compilers.

Another common home-baked solution is an abuse of the volatile keyword, like this.

memset(pwd, 0, sizeof(pwd));
*(volatile char*)pwd = *(volatile char*)pwd;

However, depending on the implementation this sometimes zeroes the whole buffer but sometimes zeroes only the first byte, leaving the remainder intact.

The final solution is to write a secure memset function that uses the volatile char trick across the whole buffer, something like this

void *secure_memset(void *v, int c, size_t n) {
        volatile unsigned char *p = v;
        while (n--)
                *p++ = c;
        return v;
}

The problem with this is that not all common compilers always respect the volatile qualifier. On top of that, this prevents the code from being optimized at all, as most compilers usually replace memset with a few assembly instructions that are much more efficient.

7 The Volatile Qualifier

Take the example code

volatile int buffer_ready;
char buffer[BUF_SIZE];
void buffer_init() {
for (size_t i = 0; i < BUF_SIZE; i++)
        buffer[i] = 0;
        buffer_ready = 1;
}

In this case, it would seem that buffer_ready should be immune from common optimization problems. However, because the for loop does not access any volatile locations or modify any related variables, the compiler can move the buffer ready = 1; line above the loop, defeating the developer’s intent.

8 Optimizing for Embedded Systems

The following C code is a function that resets a watchdog timer in a hypothetical embedded system

extern volatile int WATCHDOG;
void reset_watchdog() {
        WATCHDOG = WATCHDOG; /* load, then store */
}

Regardless of the optimization level, a conforming compiler must convert this to code that loads and then stores the watchdog register.

Recent versions of GCC for IA-32 emit this assembly code

reset_watchdog:
        movl WATCHDOG, %eax
        movl %eax, WATCHDOG
        ret

However, the latest version of GCCs port to the MSP430 microcontroller compiles the code into the following assembly:

reset_watchdog:
        ret

Thus the compiler’s optimizations prevent the timer from being reset.

9 Null-Pointer Checks

gcc deletes null-pointer checks beyond the first use of a pointer at optimization level two or higher

void bad_code(void *a) {
        int *b = a;
        int c = *b;
        static int d;

        if (b) {
                d = c;
        }
}

On the third line of code, we set c = *b. gcc assumes that this would trigger a hardware fault if b is zero, so therefore b cannot be zero.

This causes gcc to completely eliminate the check if (b) later in the file, so d = c is always run.

This can be seen in linux kernel 2.6.30

struct sock *sk = tun->sk; // sk initialized to tun->sk// ...
if (!tun) return POLLERR;

This code will never return POLLERR because it is assumed that tun cannot be zero. This could be locally exploited by mapping page zero with mmap() then triggering the bug in the process’s context, creating an execution jump to attacker-controlled data.

This was fixed in kernel version 2.6.23 by moving the definition of sk down to below the if statement.

10 That’s All, Folks

This has been a summary of Dangerous Optimizations and the Loss of Causality, a lecture given in 2010 by Professor Robert C. Seacord at Carnegie Mellon, CS 15-392.

Page link: /posts/zine4/compiler_optimization/