Saturday, October 8, 2016

The LAVA Synthetic Bug Corpora

I'm planning a longer post discussing how we evaluated the LAVA bug injection system, but since we've gotten approval to release the test corpora I wanted to make them available right away.

The corpora described in the paper, LAVA-1 and LAVA-M, can be downloaded here:

http://panda.moyix.net/~moyix/lava_corpus.tar.xz (101M)

Quoting from the included README:

This distribution contains the automatically generated bug corpora used in the paper, "LAVA: Large-scale Automated Vulnerability Addition".

LAVA-1 is a corpus consisting of 69 versions of the "file" utility, each of which has had a single bug injected into it. Each bug is a named branch in a git repository. The triggering input can be found in the file named CRASH_INPUT. To run the validation, you can use validate.sh, which builds each buggy version of file and evaluates it on the corresponding triggering input.

LAVA-M is a corpus consisting of four GNU coreutils programs (base64, md5sum, uniq, and who), each of which has had a large number of bugs added. Each injected, validated bug is listed in the validated_bugs file, and the corresponding triggering inputs can be found in the inputs subdirectory. To run the validation, you can use the validate.sh script, which builds the buggy utility and evaluates it on triggering and non-triggering inputs.

For both corpora, the "backtraces" subdirectory contains the output of gdb's backtrace command for each bug.

Enjoy!

3 comments:

Yuwei Li said...

Hey, I download the LAVA corpora, and I run the script validate.sh and get the result from the ubuntu terminal as following:
----------------------
Building buggy base64...
Checking if buggy base64 succeeds on non-trigger input...
Success: base64 -d inputs/utmp.b64 returned 127
Validating bugs...
Validated 0 / 44 bugs
You can see validated.txt for the exit code of each buggy version.
--------------------------
which means I don't succeed injecting bugs. One of the codes in validate.sh is "./configure --prefix=`pwd`/lava-install LIBS="-lacl" &> /dev/null", but I cannot find the directory "lava-install".
So how can I solve the problem? Thanks very much.

Brendan Dolan-Gavitt said...

Hi,

127 is the error code bash returns when the program can't be found. So it sounds like some part of the compilation process is failing and none of the coreutils programs have actually been built. I'd recommend running the compile step by hand to see what's going wrong, and then fixing that.

Yuwei Li said...

Mr Dolan-Gavitt,thank you very much. I change the script validate.sh and remove the "&>/dev/null", and I build the program successfully.
The changed script is as following:
--------------------------------------------------------
#!/bin/bash
PROG="base64"
PROGOPT="-d"
INPUT_PATTERN="inputs/utmp-fuzzed-%s.b64"
INPUT_CLEAN="inputs/utmp.b64"
echo "Building buggy ${PROG}..."
cd coreutils-8.24-lava-safe
make clean
./configure --prefix=/home/wendy/lava_corpus/LAVA-M/base64/coreutils-8.24-lava-safe/lava-install LIBS="-lacl"
make
make install
cd ..
./coreutils-8.24-lava-safe/lava-install/bin/${PROG} ${PROGOPT} ${INPUT_CLEAN}
rv=$?
if [ $rv -lt 128 ]; then
echo "Success: ${PROG} ${PROGOPT} ${INPUT_CLEAN} returned $rv"
else
echo "ERROR: ${PROG} ${PROGOPT} ${INPUT_CLEAN} returned $rv"
fi
echo "Validating bugs..."
cat validated_bugs | while read line ; do
INPUT_FUZZ=$(printf "$INPUT_PATTERN" $line)
{ ./coreutils-8.24-lava-safe/lava-install/bin/${PROG} ${PROGOPT} ${INPUT_FUZZ} ; }
echo $line $?
done > validated2.txt
awk 'BEGIN {valid = 0} $2 > 128 { valid += 1 } END { print "Validated valid=",valid, "/
", NR, "bugs" }' validated2.txt
echo "You can see validated2.txt for the exit code of each buggy version."
--------------------------------------------------------------------------------