GPU concurrency: weak behaviours and programming assumptions

Jade Alglave • Mark Batty • Alastair F. Donaldson • Ganesh Gopalakrishnan • Jeroen Ketema • Daniel Poetzl • Tyler Sorensen • John Wickerson

Introduction

This webpage contains the appendices and additional material for the ASPLOS 2015 paper GPU concurrency: weak behaviours and programming assumptions.

Technical appendices

The technical appendices contains the additional exch-sl litmus test, and details several aspects of our AMD testing campaign: the mappings from OpenCL to the Evergreen and Southern Islands ISAs, an OpenCL version of the CUDA spin-lock example, and the OpenCL analogues of the PTX litmus tests given in the paper.


The rest of the additional material is grouped according to the relevant section of the main submission.

Companion material for Section 3

We provide the full results of executing the litmus tests that we distilled from CUDA and OpenCL code.

Companion material for Section 4

We provide the full results from our evaluation of the effectiveness of various combinations of our incantations.

Companion material for Section 5

We provide the full results from running the litmus tests that we used to validate our formal model of PTX. The results are available in several layouts. 'Grouped by principle' means that the tests are grouped according to the principle (e.g. SC-per-location) that the test represents. 'Grouped by configuration' means that the tests are grouped according to whether the testing threads are in the same CTA or different CTAs, and whether the testing locations are in shared or global memory. There are 10930 test results, so the hyperlinked files are quite large; we therefore provide a shorter results page for each of the layouts, that lists only those tests for which the weak behaviour was observed on at least one chip.