Monday, September 15, 2014

Why static analyzers should see all the code

Just for fun, I decided to run a new "standard markdown" C code through a static analyzer provided by the Clang project. On the surface, this looks very easy:


CCC_CC=clang scan-build make stmd

It even finds bugs. A lot of dead assignments, and some logic & memory errors: dereferencing a null pointer, memory leaks and a double-free. However, are they real?

E.g., it complains that the following piece of code in src/bstrlib.c introduces a possible leak of memory pointed by buff which was previously allocated in the same function:


bdestroy (buff);
return ret;

It does not understand that bdestroy is a memory deallocation function. Indeed, it could be anything. It could be defined in a different file. It indeed does not destroy the buffer and thus leaks the memory if some integrity error occurs (and the return code is never checked).

So indeed, the code of bdestroy smells somewhat. But is it a problem? How can we trick clang into understanding that this can't happen?

Part of the problem stems from the fact that clang looks at one file at a time and thus does not understand dependencies between functions defined in different files. There is, however, a way to fix it.

All we need to do is to create a C source file that includes all other C source files. Let's call it "all.c".


#include "blocks.c"
#include "bstrlib.c"
#include "detab.c"
#include "html.c"
#include "inlines.c"
#include "main.c"
#include "print.c"
#include "scanners.c"
#include "utf8.c"

Unfortunately, it does not compile out of the box, because of the conflicting "advance" macros in inlines.c and utf8.c (fixable by undefining these macros at the end of each file), and because of the missing header guard around stmd.h (fixable trivially by adding it). With that, one can submit this all-inclusive file to the static analyzer:


scan-build clang -g -O3 -Wall -std=c99 -c -o src/all.o src/all.c

Result: no bugs found, except dead assignments.