Posts

Showing posts from 2015

PANDA VM Update October 2015

The PANDA Virtual machine has once again been updated, and you can download it from: http://laredo-13.mit.edu/~brendan/pandavm-20151002.ova Notable changes: We fixed a record/replay bug that was preventing Debian Wheezy and above from replaying properly. The QEMU GDB stub now works during replay, so you can break, step, etc. at various points during the replay to figure out what's going on. We still haven't implemented reverse-step though – hopefully in a future release. Thanks to Manolis Stamatogiannakis, the Linux OS Introspection code can now resolve file descriptors to actual filenames. Tim Leek then extended the file_taint plugin to use this information, so file-based tainting should be more accurate now, even if things like dup() are used. We have added support for more versions of Windows in the syscalls2 code. Enjoy!

(Sys)Call Me Maybe: Exploring Malware Syscalls with PANDA

System calls are of great interest to researchers studying malware, because they are the only way that malware can have any effect on the world – writing files to the hard drive, manipulating the registry, sending network packets, and so on all must be done by making a call into the kernel. In Windows, the system call interface is not publicly documented, but there have been lots of good reverse engineering efforts, and we now have full tables of the names of each system call ; in addition, by using the Windows debug symbols, we can figure out how many arguments each system call takes (though not yet their actual types). I recently ran 24,389 malware replays under PANDA and recorded all the system calls made, along with their arguments (just the top-level argument, without trying to descend into pointer types or dereference handle types). So for each replay, we now have a log file that looks like: 3f9b2340 NtGdiFlush 3f9b2340 NtUserGetMessage 0175feac 00000000 00000000 000000

One Weird Trick to Shrink Your PANDA Malware Logs by 84%

When I wrote about some of the lessons learned from P ANDA Malrec 's first 100 days of operation , one of the things I mentioned was that the storage requirements for the system were extremely high. In the four months since, the storage problem only got worse: as of last week, we were storing 24,000 recordings of malware, coming in at a whopping 2.4 terabytes of storage. The amount of data involved poses problems not just for our own storage but also for others wanting to make use of the recordings for research. 2.4 terabytes is a lot, especially when it's spread out over 24,000 HTTP requests. If we want our data to be useful to researchers, it would be great if we could find better ways of compressing the recording logs. As it turns out, we can! The key is to look closely at what makes up a PANDA recording: The log of non-deterministic events (the -rr-nondet.log files) The initial QEMU snapshot (the -rr-snp files) The first of these is highly redundant and actually

PANDA VM Update April 2015

The PANDA virtual machine has been updated to the latest version of PANDA, which corresponds to commit ce866e1508719282b970da4d8a2222f29f959dcd . You can download it here: http://laredo-13.mit.edu/~brendan/pandavm-20150413.tar.bz2 Some notable changes: The taint system has been rewritten and is now available as the taint2 plugin. It is at least 10x faster, and uses much less memory. You can check out an example of how to use it in the recently updated tainted instructions tutorial . Since taint is now usable, I have increased the amount of memory in the VM to 4GB, which is reasonable for most tasks that use taint. PANDA now understands system calls and their arguments on Linux (x86 and ARM) and Windows 7 (x86). This is available in the syscalls2 plugin, and even has some documentation . There is now a generic logging format for PANDA, which uses Protocol Buffers. Check out the pandalog documentation for more details. There's lots more that has changed, and I will t

100 Days of Malware

Image
It's now been a little over 100 days since I started running malware samples in PANDA  and making the executions publicly available. In that time, we've analyzed 10,794 pieces of malware, which generated: 10,794 record/replay logs , representing 226,163,195,948,195 instructions executed 10,794 packet captures , totaling 26GB of data and 33,968,944 packets 10,794 movies , which are interesting enough that I'll give them their own section 10,794 VirusTotal reports , indicating what level of detection they had when they were run by malrec 107  torrents , containing downloads of the above I've been pleased by the interest malrec has generated. We've had visitors from over 6000 unique IPs, in 89 different countries: The Movies There's a lot of great stuff in these ~10K movies. An easy way to get an idea of what's in there is to sort by filesize; because of the way MP4 encoding works, larger files in general mean that there's more going on o