Overview
This Virtual Machine has the basic requirements for CLsmith, allowing generation of random kernels, and for Oclgrind, a OpenCL emulator, allowing kernels to be ran in a hardware-independent system.
Tools present on this machine:
- CLsmith was obtained by cloning the repository available at http://www.github.com/ChrisLidbury/CLsmith.
- Oclgrind was obtained by cloning the repository available at http://www.github.com/jrprice/Oclgrind, commit 8b6c02cec, from October 14, 2014; the reason for this particular commit was that a bug identified via CLsmith prompted the author to initiate a refactoring of the code, which in turn might remove some other bugs we might identify; however, due to the hardware-independent nature of Oclgrind, we include it and programs that we have identified to have different results than we obtained on other platforms
The folders of interest present on this machine are:
- CLSmith -- contains an already-built version of the tool. For additional information about usage, please refer to the provided file, CLSMITH_USAGE.txt.
- CLSmithTesting -- contains the essential files required for generating random programs and running them, as well as some additional hand-picked randomly-generated programs to demonstrate divergent results between Oclgrind and other devices. These are all generated by CLsmith, but regeneration with the current version, even if the same seed is given, is not guaranteed, due to changes done to the tool since. Three of these programs (006.cl, 012.cl and 015.cl) have been reduced, meaning most of the program itself has been deleted in order to have a minimal programs exposing a possible bug.
- Tests -- contains all the random programs used in Sections 6.3 and 6.4 of the paper, which led to the results reported in Table 4 and Table 5 respectively. With this virtual machine, all results for Oclgrind (configuration 17) are reproducible by running the tests inside this folder.
- emibench -- contains a pre-built version of emibench, with the selected benchmarks from the Parboil and Rodinia suites; also contains the results we obtained from testing these on our available platforms. These results correspond to the experiment described in Section 6.2 of the paper, with results summarized in Table 3.
- Data -- contains the raw data obtained by running the experiments described in Sections 6.3 and 6.4 of the paper across all our configurations, along with scripts that can be used to produce Tables 4 and 5. Note that these results come from running the tests available in the Tests directory. Also note that for table 5, we identified an error with the script producing the table. It would randomly classify a number of tests incorrectly, producing different results across runs. We have fixed this bug, always producing the correct result, but unfortunately, the table in our submitted paper is not precise in terms of the program categories (we are making this precise for the camera-ready version).
Usage instructions
Generating random OpenCL programs via CLsmith and then running them
- execute
cd $HOME/CLSmithTesting
- execute
./CLSmith [flags]
to produce a random program named CLProg.c; use the following flags to produce a program for the corresponding mode:
--fake_divergence --group_divergence
for BASIC mode
--fake_divergence --group_divergence --vectors
for VECTOR mode
--fake_divergence --group_divergence --vectors --inter_thread_comm
for BARRIER mode
--fake_divergence --group_divergence --vectors --atomics
for ATOMIC SECTION mode
--fake_divergence --group_divergence --vectors --atomic_reductions
for ATOMIC REDUCTION mode
--fake_divergence --group_divergence --vectors --inter_thread_comm --atomics --atomic_reductions
for ALL mode
- for more information about the flags, please refer to CLSMITH_USAGE.txt
- in order to run the produced OpenCL kernel, execute
oclgrind ./cl_launcher -n Oclgrind -f CLProg.c
- if the execution was successful, the result for each work item will be printed on the screen, separated by a comma; other possible results of the execution can be compiler crashes (usually outputting a stack trace), runtime crashes (usually printing an OpenCL error code) or non-terminating programs
- due to the possibility of a randomly generated program to not be terminated, it is sometimes required to kill the running program if it does not finish in a reasonable time; however, as Oclgrind by itself is fairly slow, we recommend waiting at least one minute before killing the run
- in order to provide the seed for the random generation, pass the -s argument, where is a number; programs generated using the same seed will be identical (provided the generator itself has not been changed)
- in order to compile the kernel without optimisations, pass the ---disable_opts flag to cl_launcher; for more flags available, execute
./cl_launcher ---help
; note that programs generated by CLsmith already have some required parameters added as the first line of the program, which are parsed by the launcher; however, the launcher does prioritize any command-line arguments
Running existing buggy Oclgrind programs
- we have provided a number of .cl programs that we selected from our results that showcase Oclgrind's different behavior than other platforms we have tested; they are in the CLSmithTesting folder
- in order to run one of the programs, execute
oclgrind ./cl_launcher -n Oclgrind -f <program_name>
- most of them are compiler crashes, which are obvious problems; the rest will run successfully, but will produce a different result than most other platforms we have tested; in order to observe this buggy behavior, the same programs will need to be run on other hardware; if possible, then simply copying over the CLSmithTesting folder to a machine with OpenCL compatible hardware, then executing
./cl_launcher -d 0 -p 0 -f <program_name>
should yield a different result. Note that -d 0 -p 0
refer to device index and platform index 0, as identified by OpenCL; if the name of the device is known, these two arguments can be replaced by -n <device_name_contains>
- following, we will shortly discuss the results the can be obtained from running the provided kernels:
- 006.cl -- the compiler crashes, most likely due to passing a pointer set to NULL to a function
- 012.cl -- Oclgrind returns 0, while other platforms return 0xffffffff; a possible cause could be the underflow of the unsigned integer happening in the for-loop
- 015.cl -- front-end compiler crash, possibly comming from LLVM
- basic_0022.cl -- Oclgrind returns 0x0 for each thread, while other devices return 0x8c3296ff16049011
- basic_0333.cl -- Oclgrind returns multiple results overall, while other devices return a single result for all threads, 0x71b80e5ffa5b1be2
- basic_0411.cl -- Oclgrind returns a different result for all threads, 0xb73999631d41d69, as opposed to 0xf941f54174cee616 returned by other devices
- basicV_1678.cl -- Oclgrind crashes, again possibly due to the LLVM interpreter
- basicV_2756.cl -- Similar crash as the above program
Generating results from the paper
- as this Virtual Machine comes with Oclgrind present, all results for Oclgrind can be generated; IMPORTANT: running all the experiments again will take a very long time (estimated 2-3 days per batch of 5000 tests)
- from the Tests folder, copy over the desired batch of tests to CLSmithTesting
- execute
oclgrind ./cl_get_and_test -zipfile <test_batch_name> -device_name_contains Oclgrind
- the tests will be decompressed and will be executed one by one
- at the end of the execution, the results can be found in Results.csv
- these results are gathered in row 17 (+/- for with and without optimisations) of table 4
- in order to run the programs from table 5, copy over all the contents of the basic_viar_emi folder to CLSmithTesting, then execute
./runall.sh
; this will produce a corresponding .csv file for each zip; again, note that this will take a very long time to run
- we have provided a small zipfile containing the first 8 programs of tests generated under the ALL mode. It can be found in /Tests/basic_viar_8.zip. Using this zipfile should produce results in a matter of minutes.
Running emibench
- we would like to point out, as we show in table 3 of our paper, that we only had one of the benchmarks successfully compile and run with Oclgrind, even without any code injection
- in order to run a single emiblock for a single benchmark, enter the benchmark's folder via
cd $HOME/emibench/<benchmark>
and execute
oclgrind ../scripts/runsingle.py --optimisations [0|1] --substitutions [0|1] --emi_block [0..124|999]
- in order to run all emiblocks for a benchmark, execute
oclgrind ../scripts/runall.py
from within the benchmark's folder
- in order to run all emiblocks over all benchmarks, execute
./root_runall.py
from the $HOME/emibench folder
Generating the tables from the paper
- we have added the data required in order to produce tables 3, 4 and 5 from the paper; for tables 4 and 5, you need to be inside either sub-folder of the Data folder; for table 3, you need to be inside $HOME/emibench/results
- execute
./crunch.py > table.tex
, then pdflatex wrapper.tex
; this will produce a file, wrapper.pdf
- copy this file over to a machine, with
scp
or by any other method, and open it in a pdf viewer in order to view the table generated; we do apologise for not providing a graphical desktop for this issue