A portable inter-workgroup barrier for GPUs
This page contains the supplementary material for the paper: Portable Inter-Workgroup Barrier Synchronisation for GPUs.
A write-up containing the mutex implementation details, the OpenCL 2.0 atomic implementation details, and further formalisations from the paper can be downloaded here.
The code for the experiments is hosted on github here.
A tar file containing our experimental results can be downloaded here. This result set includes occupancy numbers found with the protocol, protocol timing results, timing results for Pannotia applications (with and without our barrier), timing rusults for Lonestar-GPU, and tuning parameters for both Pannotia and Lonestar-GPU applications