libFuzzer菜鸟入门

简介

LibFuzzer是一个in-process，coverage-guided，evolutionary模糊测试引擎，是LLVM项目的一部分。LibFuzzer和要被测试的库链接在一起，通过一个特殊的模糊测试进入点（目标函数），用测试用例feed要被测试的库。fuzzer会跟踪哪些代码区域已经测试过，然后在输入数据的语料库上产生变异，来最大化代码覆盖。代码覆盖的信息由LLVM的SanitizerCoverage插桩提供。

模糊测试种类：

Generation Based：通过对目标协议或文件格式建模的方法，从零开始产生测试用例，没有先前的状态
Mutation Based：基于一些规则，从已有的数据样本或存在的状态变异而来
Evolutionary：产生或变异或两者都有，In-process有代码覆盖反馈

安装

我的测试环境是VMware+Ubuntu16.04 x64（官方推荐），一个最近版本的clang编译器，可以使用checkout_build_install_llvm.sh这个脚本来安装。
其他依赖：

$ sudo apt-get install -y make autoconf automake libtool pkg-config zlib1g-dev
$ git clone https://github.com/Dor1s/libfuzzer-workshop.git
$ cd libfuzzer-workshop/libFuzzer
$ Fuzzer/build.sh

编写fuzzer

Fuzzer例子1
考虑如下的函数(libfuzzer-workshop/lessons/04/vulnerable_functions.h)：

bool VulnerableFunction1(const uint8_t* data, size_t size) {
  bool result = false;
  if (size >= 3) {
    result = data[0] == 'F' &&
             data[1] == 'U' &&
             data[2] == 'Z' &&
             data[3] == 'Z';
  }
  return result;
}

用下面的fuzz target来测试它：

#include <stdint.h>
#include <stddef.h>
#include "vulnerable_functions.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    VulnerableFunction1(data, size);
    return 0;
}

如下编译fuzzer

1
2
3

$ clang++ -g -std=c++11 -fsanitize=address -fsanitize-coverage=trace-pc-guard \
first_fuzzer.cc ../../libFuzzer/libFuzzer.a \
-o first_fuzzer

Memory Tools

AddressSanitizer：检测use-after-free, buffer overflows(heap, stack, globals), stack-use-after-return, container-overflow
MemorySanitizer：检测uninitialized memory reads
UndefinedBehaviorSanitizer：检测多种类别的bug，esp on type confusion, signed-integer-overflow, undefined shift, etc.

建一个空的目录来存放语料库，运行fuzzer：

$ mkdir corpus1
$ ./first_fuzzer corpus1
INFO: Seed: 675115685
INFO: Loaded 1 modules (39 guards): [0x773ea0, 0x773f3c),
Loading corpus dir: corpus1
INFO: -max_len is not provided, using 64
#0    READ units: 4
#4    INITED cov: 7 ft: 6 corp: 4/131b exec/s: 0 rss: 29Mb
#100    NEW    cov: 7 ft: 7 corp: 5/133b exec/s: 0 rss: 29Mb L: 2 MS: 1 EraseBytes-
=================================================================
==3898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200001b133 at pc 0x000000515620 bp 0x7ffd8ed9b2b0 sp 0x7ffd8ed9b2a8
READ of size 1 at 0x60200001b133 thread T0
    #0 0x51561f in VulnerableFunction1(unsigned char const*, unsigned long) /home/fanrong/Computer/libfuzzer-workshop/lessons/04/./vulnerable_functions.h:22:14
    #1 0x515d4e in LLVMFuzzerTestOneInput /home/fanrong/Computer/libfuzzer-workshop/lessons/04/first_fuzzer.cc:10:3
    #2 0x520993 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /home/fanrong/Computer/libfuzzer-workshop/libFuzzer/Fuzzer/FuzzerLoop.cpp:451:13
    #3 0x520bc0 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long) /home/fanrong/Computer/libfuzzer-workshop/libFuzzer/Fuzzer/FuzzerLoop.cpp:408:3
    #4 0x5215ab in fuzzer::Fuzzer::MutateAndTestOne() /home/fanrong/Computer/libfuzzer-workshop/libFuzzer/Fuzzer/FuzzerLoop.cpp:587:30

fuzzer发现了一个heap-buffer-overflow，复现crash：

1	$ ./first_fuzzer crash-0eb8e4ed029b774d80f2b66408203801cb982a60

fuzzer例子2
再考虑另一个函数：

constexpr auto kMagicHeader = "ZN_2016";
constexpr std::size_t kMaxPacketLen = 1024;
constexpr std::size_t kMaxBodyLength = 1024 - sizeof(kMagicHeader);
bool VulnerableFunction2(const uint8_t* data, size_t size, bool verify_hash) {
  if (size < sizeof(kMagicHeader))
    return false;

  std::string header(reinterpret_cast<const char*>(data), sizeof(kMagicHeader));
  std::array<uint8_t, kMaxBodyLength> body;

  if (strcmp(kMagicHeader, header.c_str()))
    return false;

  auto target_hash = data[--size];

  if (size > kMaxPacketLen)
    return false;
  if (!verify_hash)
    return true;

  std::copy(data, data + size, body.data());
  auto real_hash = DummyHash(body);
  return real_hash == target_hash;
}

这个例子稍微复杂一点，用最简单的fuzz target来测试它（和第一个几乎一样）：

#include <stdint.h>
#include <stddef.h>
#include "vulnerable_functions.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  VulnerableFunction2(data, size, false);
  return 0;
}

编译fuzzer：

1
2
3

$ clang++ -g -std=c++11 -fsanitize=address -fsanitize-coverage=trace-pc-guard \
second_fuzzer.cc ../../libFuzzer/libFuzzer.a \
-o second_fuzzer

运行fuzzer：

$ ./second_fuzzer corpus2
INFO: Seed: 844743323
INFO: Loaded 1 modules (39 guards): [0x773ea0, 0x773f3c),
Loading corpus dir: corpus2
INFO: -max_len is not provided, using 64
#0    READ units: 3
#3    INITED cov: 5 ft: 5 corp: 3/23b exec/s: 0 rss: 29Mb
#2097152    pulse  cov: 5 ft: 5 corp: 3/23b exec/s: 1048576 rss: 161Mb
#4194304    pulse  cov: 5 ft: 5 corp: 3/23b exec/s: 838860 rss: 293Mb
#8388608    pulse  cov: 5 ft: 5 corp: 3/23b exec/s: 838860 rss: 547Mb
#16777216    pulse  cov: 5 ft: 5 corp: 3/23b exec/s: 883011 rss: 549Mb
#33554432    pulse  cov: 5 ft: 5 corp: 3/23b exec/s: 958698 rss: 549Mb
...

输出没有什么意义，下面修改fuzz target，设置verify_hash为不同的值：

#include <stdint.h>
#include <stddef.h>
#include "vulnerable_functions.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  bool verify_hash_flags[] = { false, true };

  for (auto flag : verify_hash_flags)
    VulnerableFunction2(data, size, flag);
  return 0;
}

编译fuzzer：

1
2
3

$ clang++ -g -std=c++11 -fsanitize=address -fsanitize-coverage=trace-pc-guard \
third_fuzzer.cc ../../libFuzzer/libFuzzer.a \
-o third_fuzzer

在同一个语料库上运行fuzzer：

1	$ ./third_fuzzer corpus2

fuzzer找到一条新路径，但还是没什么用。注意到：

1	INFO: -max_len is not provided, using 64

但我们的目标分析ZN_2016协议的数据包的长度是：

1	constexpr std::size_t kMaxPacketLen = 1024;

那就添加libFuzzer的参数-max_len=1024：

./third_fuzzer corpus2 -max_len=1024
INFO: Seed: 2241499835
INFO: Loaded 1 modules (41 guards): [0x773ee0, 0x773f84),
Loading corpus dir: corpus2
#0    READ units: 3
#3    INITED cov: 26 ft: 26 corp: 3/23b exec/s: 0 rss: 31Mb
=================================================================
==5664==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffedef9ecc8 at pc 0x0000004c57d1 bp 0x7ffedef9e710 sp 0x7ffedef9dec0
WRITE of size 1023 at 0x7ffedef9ecc8 thread T0
    #0 0x4c57d0 in __asan_memmove /home/fanrong/Downloads/src/llvm/projects/compiler-rt/lib/asan/asan_interceptors.cc:470
    #1 0x5163f0 in unsigned char* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<unsigned char>(unsigned char const*, unsigned char const*, unsigned char*) /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/stl_algobase.h:384:6
    #2 0x5162c2 in unsigned char* std::__copy_move_a<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/stl_algobase.h:401:14
    #3 0x516220 in unsigned char* std::__copy_move_a2<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/stl_algobase.h:438:18
    #4 0x516033 in unsigned char* std::copy<unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/stl_algobase.h:470:15
    #5 0x515a52 in VulnerableFunction2(unsigned char const*, unsigned long, bool) /home/fanrong/Computer/libfuzzer-workshop/lessons/04/./vulnerable_functions.h:61:3
    #6 0x515edc in LLVMFuzzerTestOneInput /home/fanrong/Computer/libfuzzer-workshop/lessons/04/third_fuzzer.cc:13:5

在vulnerable_functions.h:61:3找到了stack-buffer-overflow！
fuzzer例子3
看下面的函数：

constexpr std::size_t kZn2016VerifyHashFlag = 0x0001000;

bool VulnerableFunction3(const uint8_t* data, size_t size, std::size_t flags) {
  bool verify_hash = flags & kZn2016VerifyHashFlag;
  return VulnerableFunction2(data, size, verify_hash);
}

实际上这只是前面函数的一个wrapper，但是关键点是这里有一个大的flags取值范围。列举fuzzer中所有可能的组合不太现实，而且也不能保证新的可能值不会再添加进来。
这种情况，我们可以借助libFuzzer提供的data来取一些flags的随机值：

#include <stdint.h>
#include <stddef.h>
#include "vulnerable_functions.h"
#include <functional>
#include <string>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  std::string data_string(reinterpret_cast<const char*>(data), size);
  auto data_hash = std::hash<std::string>()(data_string);

  std::size_t flags = static_cast<size_t>(data_hash);
  VulnerableFunction3(data, size, flags);
  return 0;
}

编译运行fuzzer：

$ clang++ -g -std=c++11 -fsanitize=address -fsanitize-coverage=trace-pc-guard \
fourth_fuzzer.cc ../../libFuzzer/libFuzzer.a \
-o fourth_fuzzer
$ mkdir corpus3
$ ./fourth_fuzzer corpus3/ -max_len=1024

很快就能发现相同的crash，但是现在fuzzer对于flags的值来说是通用的了。
关于libFuzzer的基本使用就先介绍到这里，后面会继续学习一些实际的用法。
reference
https://github.com/Dor1s/libfuzzer-workshop
https://github.com/google/fuzzer-test-suite
http://llvm.org/docs/LibFuzzer.html
https://security.googleblog.com/2016/08/guided-in-process-fuzzing-of-chrome.html