A portable inter-workgroup barrier for GPUs

Overview

This page contains the supplementary material for the paper: Portable Inter-Workgroup Barrier Synchronisation for GPUs.

A write-up containing the mutex implementation details, the OpenCL 2.0 atomic implementation details, and further formalisations from the paper can be downloaded here.

The code for the experiments is hosted on github here.

A tar file containing our experimental results can be downloaded here. This result set includes occupancy numbers found with the protocol, protocol timing results, timing results for Pannotia applications (with and without our barrier), timing rusults for Lonestar-GPU, and tuning parameters for both Pannotia and Lonestar-GPU applications

Publications

  • Portable Inter-workgroup Barrier Synchronisation for GPUs

    Tyler Sorensen, Alastair F. Donaldson, Mark Batty, Ganesh Gopalakrishnan, Zvonimir Rakamaric

    31st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'16)