Randomised Testing of the Compiler for a Verification-Aware Programming Language

Abstract

We present the design and implementation of two new tools for randomised testing of the compiler of the Dafny programming language. The Dafny language and tool-chain supports formal verification of rich functional properties of programs, and is seeing increasing adoption by industry, being used by companies such as Amazon, Consensys, Microsoft and VMWare for the construction of high assurance software components. Bugs in the Dafny compiler are of critical importance because they have the potential to undermine the safety and security of deployed software that has been formally verified at the source code level. Our new testing tools, fuzz-d and DafnyFuzz, are based on randomised program generation, and overcome the test oracle problem using a combination of differential testing, metamorphic testing and the generation of programs with known expected results. The new tools go significantly further than XDsmith, an existing randomised compiler testing tool for Dafny, in terms of the features of the language that they support. We have used these tools to find and report 24 previously-unknown Dafny compiler bugs that were beyond the reach of XDsmith, of which 9 are soundness issues. Our fuzzing campaign has also led to changes to the Dafny language specification via the identification of ambiguous or under-documented language features. We present a set of controlled experiments looking at statement and mutation coverage on the Dafny compiler code base. The results show that fuzz-d and DafnyFuzz achieve substantial additional coverage on top of that provided by XDsmith, and can cover some areas missed by the Dafny compiler regression test suite. All three of fuzz-d, DafnyFuzz and XDsmith improve upon the number of mutants killed by the Dafny regression test suite.