![cuda vs opencl benchmark cuda vs opencl benchmark](https://image.slideserve.com/1288498/programming-for-performance-l.jpg)
Generally this is due to the function using the Intel Integrated Performance Primitives for Image processing and Computer Vision ( IPP-ICV) and/or SIMD instructions. The above figure demonstrates that, although the CUDA implementations are on average much quicker, some functions are significantly quicker on the CPU. Next, the bottom 20 functions where the GPU speedup, was smallest. It is worth noting that the speedup of the GTX 1060 over all of the CPU’s is so large that it has to be shown on a log scale. The next figure shows the top 20 functions where the GPU speedup, was largest. This will provides a guide to the expected performance of a function irrespective of the specific configuration. Because each function has many configurations, for each function the average execution time over all configurations tested, is used to calculate the speedup over the i5-4120U. Now lets examine some individual OpenCV functions. This combined with a higher average performance for all GPU’s tested, implies that you should nearly always see an improvement when moving to the GPU, if you have several OpenCV functions in your pipeline (as long as you don’t keep moving your data to and from the GPU), even if you are using a low end two generation old laptop GPU (730m). That said even the slowest configurations on the slowest GPU’s are in the same ball park, performance wise, as the fastest configurations on the most powerful CPU’s in the test. The results demonstrate that the configuration (function arguments), makes a massive difference to the CPU/GPU performance.
![cuda vs opencl benchmark cuda vs opencl benchmark](https://www.top-password.com/images/gpu-z.png)
Because the average speedup is influenced by the number of different configurations tested per OpenCV function, two additional measures are also shown (which only consider one configuration per function) on the below figure: The below figure shows the speedup averaged over all 5300 tests (All Configs). To get an overall picture of the performance increase which can be achieved from using the CUDA functions over the standard CPU ones, the speedup of each CPU/GPU over the least powerful CPU (i5_4210U), is compared. The results for all tests are available here, where you can check if a specific configuration benefits from an improvement in performance when moved to the GPU. The full specifications are shown below, where I have again included the maximum theoretical speedup depending on whether the OpenCV functions are limited by the CPU bandwidth or clock speed (I could not find any Intel published GFLOPS information). The CPU’s tested also comprise three different micro-architectures, ranging from a low end laptop dual core (i5-4120U) to a mid range desktop quad core (i5-6500) CPU. If you are not familiar with this concept then I would recommend watching Memory Bandwidth Bootcamp: Best Practices, Memory Bandwidth Bootcamp: Beyond Best Practices and Memory Bandwidth Bootcamp: Collaborative Access Patterns by Tony Scudiero for a good overview. In “general” most algorithms will be bandwidth limited implying that the average speed up of the OpenCV functions could be somewhere between these two values. This value is just included to give an indication of what should be possible if architectural improvements, SM count etc. The full specifications are shown below, where I have also included the maximum theoretical speedup, if the OpenCV function were bandwidth or compute limited. The GPU’s tested comprise three different micro-architectures, ranging from a low end laptop (730m) to a mid range desktop (GTX 1060) GPU. Hardware: Four different hardware configurations were tested, consisting of 3 laptops and 1 desktop, the CPU/GPU combinations are listed below:
![cuda vs opencl benchmark cuda vs opencl benchmark](https://annatjanst.com/imd/YpxT4CAXoGarVnCFUDAkvwHaD3.jpg)
The total number of different CUDA performance configurations/tests which run successfully are 6031, of which only 5300 configurations are supported by both the GPU and CPU. The performance tests cover 104 of the OpenCV functions, with each function being tested for a number of different configurations (function arguments). On line 228 of modules\ts\include\opencv2\ts\ts_perf.hpp. To generate the CPU results I simply ran the CUDA performance tests with CUDA disabled, so that the fall back CPU functions were called, by changing the following #define PERF_RUN_CUDA() false //::perf::GpuPerf::targetDevice() Software: OpenCV 3.4 compiled on Visual Studio 2017 with CUDA 9.1, Intel MKL with TBB, and TBB.Average OpenCV GPU Performance Increase.The idea, is to get an indication of which OpenCV and/or Computer Vision algorithms, in general, benefit the most from GPU acceleration, and therefore, under what circumstances it might be a good idea to invest in a GPU. In this post I am going to use the OpenCV’s performance tests to compare the CUDA and CPU implementations.