Make out like a (Multi-Armed) Bandit: Improving the Odds of Fuzzer Seed Scheduling with T-Scheduler

Fuzzing is a highly-scalable software testing technique that uncovers bugs in a target program by executing it with mutated inputs. Over the life of a fuzzing campaign, the fuzzer accumulates inputs inducing new and interesting target behaviors, drawing from these inputs for further mutation. This rapidly results in a large number of inputs to select from, making it challenging to quickly and accurately select the "most promising" input for mutation. Reinforcement learning (RL) provides a natural solution to this "seed scheduling" problem: the fuzzer dynamically adapts its selection strategy by learning from past results. However, existing RL approaches are (a) computationally expensive (reducing fuzzer throughput) and/or (b) require hyperparameter tuning (reducing generality across targets and input types). To this end, we propose T-Scheduler, a seed scheduler built on multi-armed bandit theory that automatically adapts to the target without any hyperparameter tuning. We evaluate T-Scheduler over 35 CPU-yr of fuzzing, comparing it to 11 state-of-the-art schedulers. Our results show that T-Scheduler improves on these 11 schedulers on both bug-finding and coverage-expansion abilities.
View on arXiv