Fuzz Testing and Evaluation

Event details
Date | 24.01.2020 |
Hour | 15:00 › 17:00 |
Speaker | Ahmad Hazimeh |
Location | |
Category | Conferences - Seminars |
EDIC candidacy exam
Exam president: Prof Jean-Pierre Hubaux
Thesis Advisor: Prof Mathias Payer
Co-examiner: Prof Jim Larus
Abstract
Fuzzing is a prominent dynamic testing technique that aims to discover software bugs through brute force. Fuzzers have evolved in different directions, with the common goal of maximizing the efficiency of the process. However, the lack of proper benchmarks and performance metrics results in ad-hoc evaluations that prohibit fair comparisons between. In this study, we examine an existing ground-truth benchmark, LAVA-M, that has become the de-facto for fuzzer evaluation, and we shed the light on its shortcomings. Among the fuzzers that were evaluated against LAVA-M, Angora had displayed the highest detection rate and was thus the subject of scrutiny. We explore Angora's points of strength and weakness and discuss the pitfalls in its evaluation. Lastly, the third work performs a survey of previous evaluations, highlights the drawbacks of common practices, and suggests guidelines for more consistent evaluations. Based on these works, we propose Magma, a ground-truth fuzzing benchmark with real programs and bugs, designed to closely mimic in-the-wild scenarios for evaluating software testing tools.
Background papers
Angora: Efficient Fuzzing by Principled Search, by Peng Chen, Hao Chen, IEEE S&P 2018.
LAVA: Large-Scale Automated Vulnerability Addition, by Brendan Dolan-Gavitt et al., IEEE S&P 2016.
Evaluating Fuzz Testing, by George Klees et al., ACM CCS 2018.
Exam president: Prof Jean-Pierre Hubaux
Thesis Advisor: Prof Mathias Payer
Co-examiner: Prof Jim Larus
Abstract
Fuzzing is a prominent dynamic testing technique that aims to discover software bugs through brute force. Fuzzers have evolved in different directions, with the common goal of maximizing the efficiency of the process. However, the lack of proper benchmarks and performance metrics results in ad-hoc evaluations that prohibit fair comparisons between. In this study, we examine an existing ground-truth benchmark, LAVA-M, that has become the de-facto for fuzzer evaluation, and we shed the light on its shortcomings. Among the fuzzers that were evaluated against LAVA-M, Angora had displayed the highest detection rate and was thus the subject of scrutiny. We explore Angora's points of strength and weakness and discuss the pitfalls in its evaluation. Lastly, the third work performs a survey of previous evaluations, highlights the drawbacks of common practices, and suggests guidelines for more consistent evaluations. Based on these works, we propose Magma, a ground-truth fuzzing benchmark with real programs and bugs, designed to closely mimic in-the-wild scenarios for evaluating software testing tools.
Background papers
Angora: Efficient Fuzzing by Principled Search, by Peng Chen, Hao Chen, IEEE S&P 2018.
LAVA: Large-Scale Automated Vulnerability Addition, by Brendan Dolan-Gavitt et al., IEEE S&P 2016.
Evaluating Fuzz Testing, by George Klees et al., ACM CCS 2018.
Practical information
- General public
- Free
Contact
- EDIC - edic@epfl.ch