So you've written your software in C but it's still too slow. Your next optimization step is to include SIMD instructions but you are to lazy to learn to do it yourself, or you are looking for an excuse to blow several grands on a graphic card and you hate Nvidia? OpenCL might be right for you !
How it works
While Nvidia's CUDA is made to be used by computer illiterate scientists, OpenCL forces you to tell it exactly what you want, so you'll need this. Both CUDA and OpenCL are lower-level than C, so you better know what kind of hardware you expect it to run on or check it at runtime and write the code for the different hardware.
- Environment variables are initiated. - This is were the program learn what device will execute the code, like which of your numerous GPU it will use if you are overcompensating for your lack of dick, etc...
- Memory initialization. - If you want to use the GPU, this is where you reserve memory on it. Special allocations also exists for CPU. Can also be done later.
- Reading and building the program. - The code have to be built at runtime to allow prior code to determine where the kernels will execute.
- Extracting the kernels. - Getting the different kernels functions from the compiled program.
- Write data to GPU memory, if program for GPU.
- Use them! aka enqueue a kernel in a command queue - You didn't do all that for nothing, didn't you.
- Read data from GPU memory if program for GPU, or unmap for CPU.
- Release Kernels, memory, and environment variables.
- Limit data transfers. They take important computer time.
- 256 work-item per GPU work-group will give good results most of the time.
- Error-check continuously, or you'll only get them when freeing memory.
- Make sure to create a fuckloads of threads. It is normal to sometime have more than one thread per data.
OpenCL is part of a series on
Visit the Softwarez Portal for complete coverage.
OpenCL is part of a series on Programming.
[Enter the Matrix]