### A Sound and Complete Abstraction for Reasoning About Parallel Prefix Sums

#### Abstract

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance.

We introduce a novel abstraction, the interval of summations, that
allows scalable reasoning about implementations of prefix sums. We
present this abstraction as a monoid, and prove a soundness and
completeness result showing that a generic sequential prefix sum
implementation is correct for an array of length n if and only if it
computes the correct result for a specific test case when instantiated
with the interval of summations monoid. This allows correctness to be
established by running a single test where the input and result
require O(nlg(n)) space. This improves upon an existing result by
Sheeran where the input requires O(nlg(n)) space and the result
O(n^{2}lg(n)) space, and is more feasible for large n than a
method by Voigtländer thatuses O(n) space for the input and
result but requires running O(n^{2}) tests. We then extend our
abstraction and results to the context of data-parallel programs,
developing an automated verification method for GPU implementations of
prefix sums. Our method uses static verification to prove that a
generic prefix sum implementation is data race-free, after which
functional correctness of the implementation can be determined by
running a single test case under the interval of summations abstraction.

We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtländer’s method when applied to large arrays.