Implementing and Evaluating Candidate-Based Invariant Generation

Abstract

The discovery of inductive invariants lies at the heart of static program verification. Presently, many automatic solutions to inductive invariant generation are inflexible, only applicable to certain classes of programs, or unpredictable. An automatic technique that circumvents these deficiencies to some extent is candidate-based invariant generation, whereby a large number of candidate invariants are guessed and then proven to be inductive or rejected using a sound program analyser. This paper describes our efforts to apply candidate-based invariant generation in GPUVerify, a static checker of programs that run on GPUs. We study a set of 383 GPU programs that contain loops, drawn from a number of open source suites and vendor SDKs. Among this set, 253 benchmarks require provision of loop invariants for verification to succeed.

We describe the methodology we used to incrementally improve the invariant generation capabilities of GPUVerify to handle these benchmarks, through candidate-based invariant generation, whereby potential program invariants are speculated using cheap static analysis and subsequently either refuted or proven. We also describe a set of experiments that we used to examine the effectiveness of our rules for candidate generation, assessing rules based on their generality (the extent to which they generate candidate invariants), hit rate (the extent to which the generated candidates hold), effectiveness (the extent to which provable candidates actually help in allowing verification to succeed), and influence (the extent to which the success of one generation rule depends on candidates generated by another rule). We believe that our methodology for devising and evaluation candidate generation rules may serve as a useful framework for other researchers interested in candidate-based invariant generation.

The candidates produced by GPUVerify help to verify 231 of these 253 programs. An increase in precision, however, has created sluggishness in GPUVerify because more candidates are generated and hence more time is spent on computing those which are inductive invariants. To speed up this process, we have investigated four under-approximating program analyses that aim to reject false candidates quickly and a framework whereby these analyses can run in sequence or in parallel. Across two platforms, running Windows and Linux, our results show that the best combination of these techniques running sequentially speeds up invariant generation across our benchmarks by 1.17x (Windows) and 1.01x (Linux), with per-benchmark best speedups of 93.58x (Windows) and 48.34x (Linux), and worst slowdowns of 10.24x (Windows) and 43.31x (Linux). We find that parallelising the strategies marginally improves overall invariant generation speedups to 1.27x (Windows) and 1.11x (Linux), maintains good best-case speedups of 91.18x (Windows) and 44.60x (Linux), and, importantly, dramatically reduces worst-case slowdowns to 3.15x (Windows) and 3.17x (Linux).