Fuzz.Kernel-Fuzz-With-Syzkaller
2024-03-05 10:05:02 # Fuzz

overview

这一篇是笔者在上一门同linux有关的专选课程时的结课报告。

以笔者现在的眼光来看,这篇文章有相当的糊弄学的成分。笔者在阅读源代码的过程中,过多地关注了工程性的实现,而没有触及fuzz领域最核心的问题:

  • syzkaller是如何抽象结构化的输入的
  • syzkaller的种子变异策略

这些部分笔者将在闲暇时间来补全,留在此处的,就暂且是一篇流水帐式的源代码阅读文章了

linux内核漏洞挖掘技术概要

当前,软件的自动化漏洞利用主要有以下三种技术: 符号执行、模糊测试(Fuzz)、污点分析。

其中,linux内核作为一个逻辑复杂的庞大项目,采用符号执行和污点分析的方法,在运行时间上开销过大,因此,目前广泛使用的方法是模糊测试(Fuzz)。

模糊测试指通过种子产生大量输入,然后根据运行信息对种子进行变异,引导产生新的输入语料,并运行测试的过程。

目前通行的内核测试框架是有Google 开源的syzkaller。 syzkaller仍然是基于覆盖率引导的fuzz框架,特别之处在于,由于内核给用户态的接口是一系列的系统调用,因此,syzkaller 将内核测试输入抽象化为一系列系统调用,采用syz-manager和syz-fuzzer的双端架构,实现了内核漏洞的快速挖掘

TODO

  • [ ] 变异策略
  • [ ] 语料生成
  • [ ] syzlang书写

syzkaller 源代码分析

syzkaller 源代码如图,存在三个核心组件:

  • syz-fuzzer
  • syz-manager
  • syz-executor

syz-manager 进程启动、监视和重新启动多个 VM 实例,并在 VM 内启动 syz-fuzzer 进程。 syz-manager 负责长时间存储输入语料和崩溃报告。 一般在host机器上运行。

syz-fuzzer 进程运行在待测试VM中。 syz-fuzzer 指导模糊测试过程(输入生成、变异、最小化等),并通过 RPC 将触发新覆盖范围的输入发送回 syz-manager 进程。 它还负责启动 syz-executor 进程。

每个 syz-executor 进程执行一个输入(一系列系统调用)。 它接受从 syz-fuzzer 进程执行的程序并将结果发送回。 使用用 C++ 编写,编译为静态二进制文件并使用共享内存进行通信。

syz-fuzzer

main

fuzzer初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
debug.SetGCPercent(50)

var (
flagName = flag.String("name", "test", "unique name for manager")
flagOS = flag.String("os", runtime.GOOS, "target OS")
flagArch = flag.String("arch", runtime.GOARCH, "target arch")
flagManager = flag.String("manager", "", "manager rpc address")
flagProcs = flag.Int("procs", 1, "number of parallel test processes")
flagOutput = flag.String("output", "stdout", "write programs to none/stdout/dmesg/file")
flagTest = flag.Bool("test", false, "enable image testing mode") // used by syz-ci
flagRunTest = flag.Bool("runtest", false, "enable program testing mode") // used by pkg/runtest
flagRawCover = flag.Bool("raw_cover", false, "fetch raw coverage")
)
defer tool.Init()()

首先获取了相关参数。

1
2
3
4
5
6
7
8
shutdown := make(chan struct{})
osutil.HandleInterrupts(shutdown)
go func() {
// 应对GCE的抢占
<-shutdown
log.Logf(0, "SYZ-FUZZER: PREEMPTED")
os.Exit(1)
}()

然后启动了一个协程实现来检测shutdown信号,如果出现shutdown,需要停机并退出。

连接manager
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
log.Logf(1, "connecting to manager...")
a := &rpctype.ConnectArgs{
Name: *flagName,
MachineInfo: machineInfo,
Modules: modules,
}
r := &rpctype.ConnectRes{} // 建立rpc链接
if err := manager.Call("Manager.Connect", a, r); err != nil {
log.SyzFatalf("failed to call Manager.Connect(): %v ", err)
}
featureFlags, err := csource.ParseFeaturesFlags("none", "none", true) // 对于一些功能选项的处理
if err != nil {
log.SyzFatalf("%v", err)
}
if r.CoverFilterBitmap != nil {
if err := osutil.WriteFile("syz-cover-bitmap", r.CoverFilterBitmap); err != nil {
log.SyzFatalf("failed to write syz-cover-bitmap: %v", err)
}
}
if r.CheckResult == nil {
checkArgs.gitRevision = r.GitRevision
checkArgs.targetRevision = r.TargetRevision
checkArgs.enabledCalls = r.EnabledCalls
checkArgs.allSandboxes = r.AllSandboxes
checkArgs.featureFlags = featureFlags
r.CheckResult, err = checkMachine(checkArgs)
if err != nil {
if r.CheckResult == nil {
r.CheckResult = new(rpctype.CheckArgs)
}
r.CheckResult.Error = err.Error()
}
r.CheckResult.Name = *flagName
if err := manager.Call("Manager.Check", r.CheckResult, nil); err != nil {
log.SyzFatalf("Manager.Check call failed: %v", err)
}
if r.CheckResult.Error != "" {
log.SyzFatalf("%v", r.CheckResult.Error)
}
} else {
target.UpdateGlobs(r.CheckResult.GlobFiles)
if err = host.Setup(target, r.CheckResult.Features, featureFlags, config.Executor); err != nil {
log.SyzFatalf("%v", err)
}
}
log.Logf(0, "syscalls: %v", len(r.CheckResult.EnabledCalls[sandbox]))
for _, feat := range r.CheckResult.Features.Supported() {
log.Logf(0, "%v: %v", feat.Name, feat.Reason)
}
createIPCConfig(r.CheckResult.Features, config)

接下来启动进程连接syz-manager。manager 会检查本机环境并返回检查结果,然后根据检查结果设置一些执行选项,准备沙箱环境。

fuzzer process

1
2
3
4
5
6
7
8
9
log.Logf(0, "starting %v fuzzer processes", *flagProcs)
for pid := 0; pid < *flagProcs; pid++ {
proc, err := newProc(fuzzer, pid)
if err != nil {
log.SyzFatalf("failed to create proc: %v", err)
}
fuzzer.procs = append(fuzzer.procs, proc)
go proc.loop()
}

接下来根据配置启动N个fuzz协程,每个协程对应一个VM实例。

1
fuzzer.pollLoop()

proc.loop

proc.loop是进程运行的核心代码

1
2
3
4
5
6
7
8
func (proc *Proc) loop() {
generatePeriod := 100
if proc.fuzzer.config.Flags&ipc.FlagSignal == 0 {
// If we don't have real coverage signal, generate programs more frequently
// because fallback signal is weak.
generatePeriod = 2
}
// 这部分代码在控制测试用例生成的频率。

判断配置是否没有启用真实的覆盖信号反馈(real coverage signal)。

如果没有启用真实覆盖信号,则将 generatePeriod 置为2,意味着每2个循环就随机生成一个新的测试用例。

之所以这么做是因为,如果没有真实的覆盖信号,只依赖fallback signal,那么信号会很弱。因此需要更频繁地生成新的测试用例来弥补。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for i := 0; ; i++ {
item := proc.fuzzer.workQueue.dequeue()
if item != nil {
switch item := item.(type) {
case *WorkTriage:
proc.triageInput(item)
case *WorkCandidate:
proc.execute(proc.execOpts, item.p, item.flags, StatCandidate)
case *WorkSmash:
proc.smashInput(item)
default:
log.SyzFatalf("unknown work type: %#v", item)
}
continue
}

每次从 workQueue 中取出一个测试用例,并根据测试用例的不同类型来解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
	ct := proc.fuzzer.choiceTable
fuzzerSnapshot := proc.fuzzer.snapshot()
if len(fuzzerSnapshot.corpus) == 0 || i%generatePeriod == 0 {
// 产生新进程
p := proc.fuzzer.target.Generate(proc.rnd, prog.RecommendedCalls, ct)
log.Logf(1, "#%v: generated", proc.pid)
proc.executeAndCollide(proc.execOpts, p, ProgNormal, StatGenerate)
} else {
// 变异已经存在的进程
p := fuzzerSnapshot.chooseProgram(proc.rnd).Clone()
p.Mutate(proc.rnd, prog.RecommendedCalls, ct, proc.fuzzer.noMutate, fuzzerSnapshot.corpus)
log.Logf(1, "#%v: mutated", proc.pid)
proc.executeAndCollide(proc.execOpts, p, ProgNormal, StatFuzz)
}
}
}


获取choice table和corpus的快照copy。

根据条件选择逻辑:

  • 如果corpus为空或每100次循环执行一次,则通过Generate完全随机生成一个新的测试用例prog;
  • 否则,从corpus中随机选择一个case作为基础,通过Mutate进行变异生成新的prog。

生成的prog通过executeAndCollide执行和碰撞检测。

fuzzer.pollLoop

主线程在启动这些协程之后所需要做的工作其实就是响应这些协程的请求,并负责与 syz-manager 间进行 RPC 通信,通过一个不会返回的 pollLoop() 函数完成,该函数核心其实就是一个无限循环:

  • 循环等待 ticker (每 3s 响应一次的计时器)或 fuzzer.needPoll 这两个 channel 之一有数据传来
  • 如果是 fuzzer.needPoll1 传来请求或是距离上次 poll 的时间大于 10s:
  • 检查 workQueue 是否需要新的 candidate(candidate 数量少于 executor 数量),若不是且本次请求处理为 fuzzer.needPoll 传来请求,则等到到距离上次 poll 的时间大于 10s。
    收集 executor 数据,调用 poll() 通过 RPC 向 syz-manager 获取新的 candidate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
func (fuzzer *Fuzzer) pollLoop() {
var execTotal uint64
var lastPoll time.Time
var lastPrint time.Time
ticker := time.NewTicker(3 * time.Second * fuzzer.timeouts.Scale).C
for {
poll := false
select {
case <-ticker:
case <-fuzzer.needPoll:
poll = true
}
// 循环等待 `ticker` (每 3s 响应一次的计时器)或 `fuzzer.needPoll`
// 这两个 channel 之一有数据传来
if fuzzer.outputType != OutputStdout && time.Since(lastPrint) > 10*time.Second*fuzzer.timeouts.Scale {
// 如果是 `fuzzer.needPoll1` 传来请求或是距离上次 poll 的时间大于 10s:
log.Logf(0, "alive, executed %v", execTotal)
lastPrint = time.Now()
}
if poll || time.Since(lastPoll) > 10*time.Second*fuzzer.timeouts.Scale {
needCandidates := fuzzer.workQueue.wantCandidates()
if poll && !needCandidates {
continue
}
// 检查 workQueue 是否需要新的 candidate(candidate 数量少于 executor 数量)
// 若不是且本次请求处理为 `fuzzer.needPoll` 传来请求
// 则等到到距离上次 poll 的时间大于 10s
stats := make(map[string]uint64)
for _, proc := range fuzzer.procs {
stats["exec total"] += atomic.SwapUint64(&proc.env.StatExecs, 0)
stats["executor restarts"] += atomic.SwapUint64(&proc.env.StatRestarts, 0)
}
// 收集 executor 数据,调用 poll() 通过 RPC 向 syz-manager 获取新的 candidate
for stat := Stat(0); stat < StatCount; stat++ {
v := atomic.SwapUint64(&fuzzer.stats[stat], 0)
stats[statNames[stat]] = v
execTotal += v
}
if !fuzzer.poll(needCandidates, stats) {
lastPoll = time.Now()
}
}
}
}

syz-manager

main

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func main() {
if prog.GitRevision == "" {
log.Fatalf("bad syz-manager build: build with make, run bin/syz-manager")
}
flag.Parse() // 解析参数
log.EnableLogCaching(1000, 1<<20)
cfg, err := mgrconfig.LoadFile(*flagConfig)
if err != nil {
log.Fatalf("%v", err)
}
if cfg.DashboardAddr != "" {
// This lets better distinguish logs of individual syz-manager instances.
log.SetName(cfg.Name)
}
RunManager(cfg) // 真正的启动函数
}

主要是解析参数和配置文件,然后调用RunManager

RunManager

1
2
3
4
5
6
7
8
9
10
11
12

var vmPool *vm.Pool
// Type "none" is a special case for debugging/development when manager
// does not start any VMs, but instead you start them manually
// and start syz-fuzzer there.
if cfg.Type != "none" {
var err error
vmPool, err = vm.Create(cfg, *flagDebug)
if err != nil {
log.Fatalf("%v", err)
}
}

首先创建了vmPool , 用来管理VM资源

1
2
3
4
5
6
7
crashdir := filepath.Join(cfg.Workdir, "crashes")
osutil.MkdirAll(crashdir)

reporter, err := report.NewReporter(cfg)
if err != nil {
log.Fatalf("%v", err)
}

然后初始化了测试语料库并创建了crash的记录文件, 接着初始化了一个HTTP服务器,用来在本地端口以Web页面的形式呈现测试结果

1
2
3
4
mgr.preloadCorpus() // 准备输入语料
mgr.initStats() // 初始化一状态变量
mgr.initHTTP() // 初始化了一个HTTP服务器
mgr.collectUsedFiles() // 收集使用过的文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
go func() {
for lastTime := time.Now(); ; {
time.Sleep(10 * time.Second)
now := time.Now()
diff := now.Sub(lastTime)
lastTime = now
mgr.mu.Lock()
if mgr.firstConnect.IsZero() {
mgr.mu.Unlock()
continue
}
mgr.fuzzingTime += diff * time.Duration(atomic.LoadUint32(&mgr.numFuzzing))
executed := mgr.stats.execTotal.get()
crashes := mgr.stats.crashes.get()
corpusCover := mgr.stats.corpusCover.get()
corpusSignal := mgr.stats.corpusSignal.get()
maxSignal := mgr.stats.maxSignal.get()
triageQLen := len(mgr.candidates)
mgr.mu.Unlock()
numReproducing := atomic.LoadUint32(&mgr.numReproducing)
numFuzzing := atomic.LoadUint32(&mgr.numFuzzing)

log.Logf(0, "VMs %v, executed %v, cover %v, signal %v/%v, crashes %v, repro %v, triageQLen %v",
numFuzzing, executed, corpusCover, corpusSignal, maxSignal, crashes, numReproducing, triageQLen)
}
}()

这部分代码定义了一个匿名goroutine函数,主要完成了以下工作:

  1. 定义一个循环,按照10秒的间隔周期性执行
  2. 计算从上次统计到当前时间段的执行时间差值diff
  3. 获取mgr对象的各种统计指标:
    • execTotal: 执行的测试用例总数
    • crashes: 崩溃的测试用例数
    • corpusCover: 测试用例覆盖的代码行数
    • corpusSignal/maxSignal: 获取的代码覆盖信号总值和最大信号值
    • triageQLen: 等待处理的候选测试用例数
  4. 加载处理测试用例的虚拟机数量,复现测试的用例数等指标
  5. 将上述统计指标打印输出一次日志
1
2
3
4
5
6
7
8
osutil.HandleInterrupts(vm.Shutdown)
if mgr.vmPool == nil {
log.Logf(0, "no VMs started (type=none)")
log.Logf(0, "you are supposed to start syz-fuzzer manually as:")
log.Logf(0, "syz-fuzzer -manager=manager.ip:%v [other flags as necessary]", mgr.serv.port)
<-vm.Shutdown
return
}

实现了针对VM corruption的处理

1
mgr.vmLoop()

最后是一个主循环,用来做任务处理

manager.vmLoop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
log.Logf(0, "booting test machines...")
log.Logf(0, "wait for the connection from test machine...")
instancesPerRepro := 3
vmCount := mgr.vmPool.Count()
maxReproVMs := vmCount - mgr.cfg.FuzzingVMs
if instancesPerRepro > maxReproVMs && maxReproVMs > 0 {
instancesPerRepro = maxReproVMs
}
instances := SequentialResourcePool(vmCount, 10*time.Second*mgr.cfg.Timeouts.Scale)
runDone := make(chan *RunResult, 1)
pendingRepro := make(map[*Crash]bool)
reproducing := make(map[string]bool)
var reproQueue []*Crash
reproDone := make(chan *ReproResult, 1)
stopPending := false

首先初始化了一些变量,用来通信和复现

1
2
3
mgr.mu.Lock()
phase := mgr.phase
mgr.mu.Unlock()

加锁访问 mgr 的相关变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for crash := range pendingRepro {
if reproducing[crash.Title] {
continue
}
delete(pendingRepro, crash)
if !mgr.needRepro(crash) {
continue
}
log.Logf(1, "loop: add to repro queue '%v'", crash.Title)
reproducing[crash.Title] = true
reproQueue = append(reproQueue, crash)
}
log.Logf(1, "loop: phase=%v shutdown=%v instances=%v/%v %+v repro: pending=%v reproducing=%v queued=%v",
phase, shutdown == nil, instances.Len(), vmCount, instances.Snapshot(),
len(pendingRepro), len(reproducing), len(reproQueue))

对于没有尝试过复现的crash,加入复现队列

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
canRepro := func() bool {
return phase >= phaseTriagedHub && len(reproQueue) != 0 &&
(int(atomic.LoadUint32(&mgr.numReproducing))+1)*instancesPerRepro <= maxReproVMs
} // 设置了一个闭包判断当前能否启动复现任务

if shutdown != nil {
// 如果当前可以启动复现任务
for canRepro() {
vmIndexes := instances.Take(instancesPerRepro)
// 找到一个可用的VM
if vmIndexes == nil {
break
}
last := len(reproQueue) - 1
crash := reproQueue[last]
reproQueue[last] = nil
reproQueue = reproQueue[:last]
atomic.AddUint32(&mgr.numReproducing, 1)
log.Logf(0, "loop: starting repro of '%v' on instances %+v", crash.Title, vmIndexes)
go func() {
reproDone <- mgr.runRepro(crash, vmIndexes, instances.Put)
}()
// 启动一个协程,在可用VM上开始复现crash
}
for !canRepro() {
idx := instances.TakeOne()
if idx == nil {
break
}
log.Logf(1, "loop: starting instance %v", *idx)
go func() {
crash, err := mgr.runInstance(*idx)
runDone <- &RunResult{*idx, crash, err}
}() // 直接重启VM
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
wait:
select {
case <-instances.Freed:
// 一个实例被释放
case stopRequest <- true:
log.Logf(1, "loop: issued stop request")
stopPending = true
case res := <-runDone:
log.Logf(1, "loop: instance %v finished, crash=%v", res.idx, res.crash != nil)
if res.err != nil && shutdown != nil {
log.Logf(0, "%v", res.err)
}
stopPending = false
instances.Put(res.idx)
// 如果qemu的shutdown信号为singnal 2
// 将其设定为失去连接而不是crash
if shutdown != nil && res.crash != nil {
needRepro := mgr.saveCrash(res.crash)
if needRepro {
log.Logf(1, "loop: add pending repro for '%v'", res.crash.Title)
pendingRepro[res.crash] = true
}
}
case res := <-reproDone:
atomic.AddUint32(&mgr.numReproducing, ^uint32(0))
crepro := false
title := ""
if res.repro != nil {
crepro = res.repro.CRepro
title = res.repro.Report.Title
}
log.Logf(0, "loop: repro on %+v finished '%v', repro=%v crepro=%v desc='%v'",
res.instances, res.report0.Title, res.repro != nil, crepro, title)
if res.err != nil {
reportReproError(res.err)
}
delete(reproducing, res.report0.Title)
if res.repro == nil {
if !res.hub {
mgr.saveFailedRepro(res.report0, res.stats)
}
} else {
mgr.saveRepro(res)
}
case <-shutdown:
log.Logf(1, "loop: shutting down...")
shutdown = nil
case crash := <-mgr.hubReproQueue:
log.Logf(1, "loop: get repro from hub")
pendingRepro[crash] = true
case reply := <-mgr.needMoreRepros:
reply <- phase >= phaseTriagedHub &&
len(reproQueue)+len(pendingRepro)+len(reproducing) == 0
goto wait
case reply := <-mgr.reproRequest:
repros := make(map[string]bool)
for title := range reproducing {
repros[title] = true
}
reply <- repros
goto wait
}
}

这部分代码是用来实现虚拟机管理的核心代码

  1. instances.Freed: 处理空闲的虚拟机实例
  2. stopRequest:发出停止虚拟机的请求
  3. runDone: 处理虚拟机运行结束的结果
    • 如果运行失败,打印错误
    • 释放实例,增加到空闲池
    • 如果本次运行触发了crash,保存crash并添加到待repro队列
  4. reproDone: 处理repro结束的结果
    • 更新repro任务计数
    • 打印repro的结果
    • 如果repro失败,记录信息
    • 从正在repro列表删除
    • 根据repro的结果保存信息
  5. shutdown: 检测到关闭信号,开始关闭
  6. hubReproQueue: 从主机获取的待repro crash
  7. needMoreRepros: 返回还有待repro的crash状态
  8. reproRequest: 返回当前正在repro的crash列表

sys-executor

sys-executor 是一个使用C++写的执行器,用来真正执行 测试语料

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
	if (argc == 2 && strcmp(argv[1], "version") == 0) {
puts(GOOS " " GOARCH " " SYZ_REVISION " " GIT_REVISION);
return 0;
}
if (argc >= 2 && strcmp(argv[1], "setup") == 0) {
setup_features(argv + 2, argc - 2);
return 0;
}
if (argc >= 2 && strcmp(argv[1], "leak") == 0) {
#if SYZ_HAVE_LEAK_CHECK
check_leaks(argv + 2, argc - 2);
#else
fail("leak checking is not implemented");
#endif
return 0;
}
if (argc >= 2 && strcmp(argv[1], "setup_kcsan_filterlist") == 0) {
#if SYZ_HAVE_KCSAN
setup_kcsan_filterlist(argv + 2, argc - 2, true);
#else
fail("KCSAN is not implemented");
#endif
return 0;
}
if (argc == 2 && strcmp(argv[1], "test") == 0)
return run_tests();

if (argc < 2 || strcmp(argv[1], "exec") != 0) {
fprintf(stderr, "unknown command");
return 1;
}

程序首先解析了一系列参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
    start_time_ms = current_time_ms();
// 设置fuzz启动时间
os_init(argc, argv, (char*)SYZ_DATA_OFFSET, SYZ_NUM_PAGES * SYZ_PAGE_SIZE);
// 初始化系统调用
current_thread = &threads[0];

#if SYZ_EXECUTOR_USES_SHMEM
void* mmap_out = mmap(NULL, kMaxInput, PROT_READ, MAP_PRIVATE, kInFd, 0);
#else
void* mmap_out = mmap(NULL, kMaxInput, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); // 设置输出共享内存
#endif
if (mmap_out == MAP_FAILED)
fail("mmap of input file failed");
input_data = static_cast<char*>(mmap_out);
1
2
3
4
5
6
7
8
9
#if SYZ_EXECUTOR_USES_SHMEM
mmap_output(kInitialOutput);
// Prevent test programs to mess with these fds.
// Due to races in collider mode, a program can e.g. ftruncate one of these fds,
// which will cause fuzzer to crash.
close(kInFd);
#if !SYZ_EXECUTOR_USES_FORK_SERVER
close(kOutFd);
#endif

接下来是一些准备工作

1
2
3
4
5
6
7
8
	use_temporary_dir(); // 创建临时目录
install_segv_handler(); // 设置了段错误信号(SIGSEGV、SIGBUS)的处理函数为segv_handler
setup_control_pipes(); // 重定位标准输入和输出到pipe,便于错误
#if SYZ_EXECUTOR_USES_FORK_SERVER
receive_handshake(); // 确定连接状态
#else
receive_execute(); // 从管道读取执行请求execute_req
#endif

然后是关于测试覆盖率的计算:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
	if (flag_coverage) {
int create_count = kCoverDefaultCount, mmap_count = create_count;
if (flag_delay_kcov_mmap) {
create_count = kCoverOptimizedCount;
mmap_count = kCoverOptimizedPreMmap;
}
if (create_count > kMaxThreads)
create_count = kMaxThreads;
// 计算需要传见的文件数量
for (int i = 0; i < create_count; i++) {
threads[i].cov.fd = kCoverFd + i;
// 创建覆盖率文件描述符
cover_open(&threads[i].cov, false);
if (i < mmap_count) {
// Pre-mmap coverage collection for some threads. This should be enough for almost
// all programs, for the remaning few ones coverage will be set up when it's needed.
thread_mmap_cover(&threads[i]);
// 对部分线程提前进行覆盖率mmap,这对大多数程序已经足够
//Remaining的线程会在需要时再设置
}
}
char sep = '/';
#if GOOS_windows
sep = '\\';
#endif
char filename[1024] = {0};
char* end = strrchr(argv[0], sep);
size_t len = end - argv[0];
strncpy(filename, argv[0], len + 1);
strncat(filename, "syz-cover-bitmap", 17);
filename[sizeof(filename) - 1] = '\0';
init_coverage_filter(filename);
// 创建覆盖率的bitmap文件
}

然后开始创建执行sandbox:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

int status = 0;
if (flag_sandbox_none)
status = do_sandbox_none();
#if SYZ_HAVE_SANDBOX_SETUID
else if (flag_sandbox_setuid)
status = do_sandbox_setuid();
// 设置setuid沙箱
#endif
#if SYZ_HAVE_SANDBOX_NAMESPACE
else if (flag_sandbox_namespace)
status = do_sandbox_namespace();
// 设置namespace沙箱
#endif
#if SYZ_HAVE_SANDBOX_ANDROID
else if (flag_sandbox_android)
status = do_sandbox_android(sandbox)
// 设置 android沙箱
#endif
else
fail("unknown sandbox type");

最后执行错误处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#if SYZ_EXECUTOR_USES_FORK_SERVER                                                           
fprintf(stderr, "loop exited with status %d\n", status);
// Other statuses happen when fuzzer processes manages to kill loop, e.g. with:
// ptrace(PTRACE_SEIZE, 1, 0, 0x100040)
if (status != kFailStatus)
status = 0;
// If an external sandbox process wraps executor, the out pipe will be closed
// before the sandbox process exits this will make ipc package kill the sandbox.
// As the result sandbox process will exit with exit status 9 instead of the executor
// exit status (notably kFailStatus). So we duplicate the exit status on the pipe.
reply_execute(status);
doexit(status);
// Unreachable.
return 1;
#else
reply_execute(status);
return status;
#endif
}

使用syzkaller进行漏洞挖掘

环境配置

编译syzkaller:

1
2
3
$ go get -u -d github.com/google/syzkaller/prog
$ cd gopath/src/github.com/google/syzkaller/
$ make

编译目标内核的内核版本是linux-6.5.4

开启相关debug选项:

1
2
3
4
5
6
CONFIG_KCOV=y
CONFIG_DEBUG_INFO=y
CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y
CONFIG_CONFIGFS_FS=y
CONFIG_SECURITYFS=y

创建镜像:

1
2
3
4
5
6
$ sudo apt-get install debootstrap
$ mkdir image
$ cd image
$ wget https://raw.githubusercontent.com/google/syzkaller/master/tools/create-image.sh -O create-image.sh
$ chmod +x create-image.sh
$ ./create-image.sh

qemu启动脚本和qemu.cfg如下:

image-20231129202507739.png

image-20231129202515811.png

运行syzkaller

在经过3、4天的运行后,结果如下:

image-20231129202651345.png

漏洞分析

这里选择 memory leak in iov_iter_extract_pages 来进行分析。

首先查看crash时的函数调用栈:

image-20231130141347493.png

定位 iov_iter_exxtract_pages 函数, 这个函数从基于用户空间内存的迭代器中提取出一组连续的页面,并对这些页面加锁pin。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// lib/iov_iter.c
static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
struct page ***pages,
size_t maxsize,
unsigned int maxpages,
iov_iter_extraction_t extraction_flags,
size_t *offset0)
{
unsigned long addr;
unsigned int gup_flags = 0;
size_t offset;
int res;

if (i->data_source == ITER_DEST)
gup_flags |= FOLL_WRITE;
if (extraction_flags & ITER_ALLOW_P2PDMA)
gup_flags |= FOLL_PCI_P2PDMA;
if (i->nofault)
gup_flags |= FOLL_NOFAULT;

addr = first_iovec_segment(i, &maxsize);
*offset0 = offset = addr % PAGE_SIZE;
addr &= PAGE_MASK;
maxpages = want_pages_array(pages, maxsize, offset, maxpages);
if (!maxpages)
return -ENOMEM;
// 根据迭代器的信息,计算出需要提取的用户空间地址范围。
res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages);
// 调用pin_user_pages_fast()对该地址范围内的页面加锁
if (unlikely(res <= 0))
return res;
maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset);
// 根据实际加锁的页面数量,计算迭代器可以前进的最大长度。
iov_iter_advance(i, maxsize);
// 调用iov_iter_advance()移动迭代器。
return maxsize;
}

而syzkaller给出的漏洞是memory leak,即存在分配未释放的内存。

wamt_pages_array 三个函数中筛选

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static int want_pages_array(struct page ***res, size_t size,
size_t start, unsigned int maxpages)
{
unsigned int count = DIV_ROUND_UP(size + start, PAGE_SIZE);

if (count > maxpages)
count = maxpages;
WARN_ON(!count); // caller should've prevented that
if (!*res) {
*res = kvmalloc_array(count, sizeof(struct page *), GFP_KERNEL);
if (!*res)
return 0;
}
return count;
}

发现只有 want_pages_array 中存在堆内存分配。

如果:

1
2
if (unlikely(res <= 0))
return res;

此时,程序错误返回,前面分配的pages空间却并没有被释放,因此,会引发内存泄漏。

复现的C代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

static void sleep_ms(uint64_t ms)
{
usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts))
exit(1);
return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static bool write_file(const char* file, const char* what, ...)
{
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}
static void kill_and_wait(int pid, int* status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
for (int i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}
DIR* dir = opendir("/sys/fs/fuse/connections");
if (dir) {
for (;;) {
struct dirent* ent = readdir(dir);
if (!ent)
break;
if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
continue;
char abort[300];
snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort", ent->d_name);
int fd = open(abort, O_WRONLY);
if (fd == -1) {
continue;
}
if (write(fd, abort, 1) < 0) {
}
close(fd);
}
closedir(dir);
} else {
}
while (waitpid(-1, status, __WALL) != pid) {
}
}

static void setup_test()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
write_file("/proc/self/oom_score_adj", "1000");
}

#define KMEMLEAK_FILE "/sys/kernel/debug/kmemleak"

static void setup_leak()
{
if (!write_file(KMEMLEAK_FILE, "scan"))
exit(1);
sleep(5);
if (!write_file(KMEMLEAK_FILE, "scan"))
exit(1);
if (!write_file(KMEMLEAK_FILE, "clear"))
exit(1);
}

static void check_leaks(void)
{
int fd = open(KMEMLEAK_FILE, O_RDWR);
if (fd == -1)
exit(1);
uint64_t start = current_time_ms();
if (write(fd, "scan", 4) != 4)
exit(1);
sleep(1);
while (current_time_ms() - start < 4 * 1000)
sleep(1);
if (write(fd, "scan", 4) != 4)
exit(1);
static char buf[128 << 10];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
if (n < 0)
exit(1);
int nleaks = 0;
if (n != 0) {
sleep(1);
if (write(fd, "scan", 4) != 4)
exit(1);
if (lseek(fd, 0, SEEK_SET) < 0)
exit(1);
n = read(fd, buf, sizeof(buf) - 1);
if (n < 0)
exit(1);
buf[n] = 0;
char* pos = buf;
char* end = buf + n;
while (pos < end) {
char* next = strstr(pos + 1, "unreferenced object");
if (!next)
next = end;
char prev = *next;
*next = 0;
fprintf(stderr, "BUG: memory leak\n%s\n", pos);
*next = prev;
pos = next;
nleaks++;
}
}
if (write(fd, "clear", 5) != 5)
exit(1);
close(fd);
if (nleaks)
exit(1);
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
int iter = 0;
for (;; iter++) {
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
setup_test();
execute_one();
exit(0);
}
int status = 0;
uint64_t start = current_time_ms();
for (;;) {
if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
break;
sleep_ms(1);
if (current_time_ms() - start < 5000)
continue;
kill_and_wait(pid, &status);
break;
}
check_leaks();
}
}

uint64_t r[1] = {0xffffffffffffffff};

void execute_one(void)
{
intptr_t res = 0;
memcpy((void*)0x20000200, "/dev/sr0\000", 9);
res = syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000200ul, /*flags=*/0x1a9802ul, /*mode=*/0ul);
if (res != -1)
r[0] = res;
*(uint32_t*)0x20000740 = 0x53;
*(uint32_t*)0x20000744 = 0xfffffffe;
*(uint8_t*)0x20000748 = 0xa;
*(uint8_t*)0x20000749 = 0;
*(uint16_t*)0x2000074a = 0;
*(uint32_t*)0x2000074c = 0x20000;
*(uint64_t*)0x20000750 = 0;
*(uint64_t*)0x20000758 = 0x200005c0;
memcpy((void*)0x200005c0, "\xf6\xc7\x8b\x31\x9f\x83\x19\xde\xb1\x3d", 10);
*(uint64_t*)0x20000760 = 0x20000600;
*(uint32_t*)0x20000768 = 0;
*(uint32_t*)0x2000076c = 0;
*(uint32_t*)0x20000770 = 0;
*(uint64_t*)0x20000774 = 0;
*(uint8_t*)0x2000077c = 0;
*(uint8_t*)0x2000077d = 0;
*(uint8_t*)0x2000077e = 0;
*(uint8_t*)0x2000077f = 0;
*(uint16_t*)0x20000780 = 0;
*(uint16_t*)0x20000782 = 0;
*(uint32_t*)0x20000784 = 0;
*(uint32_t*)0x20000788 = 0;
*(uint32_t*)0x2000078c = 0;
syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0x2285, /*arg=*/0x20000740ul);
}
int main(void)
{
syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
setup_leak();
loop();
return 0;
}

由于没有对syzkaller自定义syzlang,因此此漏洞大概率能被syzbot(一个基于syzkaller的内核自动测试bot)发现:

通过搜索,果然找到了2023-08-17的相关讨论:https://lore.kernel.org/all/000000000000e32603060314b623@google.com/T/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 27234a820eeb..c3fd0448dead 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1780,8 +1780,10 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
if (!maxpages)
return -ENOMEM;
res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages);
- if (unlikely(res <= 0))
+ if (unlikely(res <= 0)) {
+ kvfree(*pages);
return res;
+ }
maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset);
iov_iter_advance(i, maxsize);
return maxsize;

查看修复patch,符合之前发现的漏洞情形