A portable inter-workgroup barrier for GPUs
Overview
This page contains the supplementary material for the paper: Portable Inter-Workgroup Barrier Synchronisation for GPUs.
A write-up containing the mutex implementation details, the OpenCL 2.0 atomic implementation details, and further formalisations from the paper can be downloaded here.
The code for the experiments is hosted on github here.
A tar file containing our experimental results can be downloaded here. This result set includes occupancy numbers found with the protocol, protocol timing results, timing results for Pannotia applications (with and without our barrier), timing rusults for Lonestar-GPU, and tuning parameters for both Pannotia and Lonestar-GPU applications
Publications
-
Portable Inter-workgroup Barrier Synchronisation for GPUs
Tyler Sorensen, Alastair F. Donaldson, Mark Batty, Ganesh Gopalakrishnan, Zvonimir Rakamaric
31st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'16)