OpenCV GPU FAQ
Requirements
What GPUs are supported?
All CUDA-enabled GPUs with compute capability higher or equal than 1.1. Full list is here. The last generation high-end or middle-end GPUs are recommended. Usually the last generation middle-end GPUs outperforms high-end GPUs from the previous generation.
What can I expect from my card?
Performance indicators for some low and high level functionality can be obtained from the performance sample (opencv/samples/gpu/performance).
What operating systems are supported?
- Windows (32 and 64 bit)
- Linux (32 and 64 bit)
- MacOS
What compilers are supported?
g++, Visual Studio 2008/2010. Other compilers are not supported by NVCC.
What are the external software required?
Only the latest CUDA Toolkit for your platform.
Build
How to build OpenCV with GPU support?
Set WITH_CUDA flag in CMake, check that all CUDA Toolkit paths are detected correctly (change if not), point to the NPP library location for your platform.
How to set target GPU architectures for OpenCV?
This can be done in CMake by setting CUDA_ARCH_BIN (generate ready to run binary code) and CUDA_ARCH_PTX (generate ready to be compiled just-in-time code) variables. Example:
cmake -DWITH_CUDA:BOOL="1" -DCUDA_NPP_LIBRARY_ROOT_DIR:PATH="C:/NPP/SDK" -DCUDA_ARCH_BIN:STRING=”1.1 1.3 2.0(1.3) 2.1(1.3)” -DCUDA_ARCH_PTX:STRING=”1.3”
Such configuration forces linking binary code for GPU with compute capability 1.1, 1.3, 2.0 (uses 1.3 instructions only), 2.1 (uses 1.3 instructions only) and intermediate code for GPU with compute capability 1.3. For more details please read NVCC documentation and OpenCV manual.
Can I check what platforms GPU module was built for?
Yes. Class cv::gpu::TargetArch can be used to obtain information about what architectures OpenCV GPU was built for.
Help
Where is the documentation?
There are GPU section in opencv.pdf. It covers all OpenCV functions ported to GPU and functions available only in GPU module.
Are there any samples?
Yes. They are here: opencv/samples/gpu. Build OpenCV with BUILD_SAMPLES flag set in CMake.
Usage
Why should I use GPU module?
To achieve higher performance in your application.
Is extra GPU knowledge necessary?
No, but It’s recommended to understand what each operation does and how much costs. Also:
- Avoid extra memory transfers between CPU and GPU
- GPU is better than CPU mostly on large data
Can I obtain information about the specified GPU?
Yes. Use class cv::gpu::DeviceInfo to obtain information about the current or the specified GPU device (e.g. you can check if your GPU is one of the GPUs the module was built for).
How to use GPU in my OpenCV CPU based app?
Minimal code changes are required. Main building block of GPU based aplication is GpuMat class in contrast to Mat class in CPU OpenCV API. You can convert one into another and mix them in code.
Are CPU and GPU OpenCV API similar?
GPU module is designed to be as similar as possible. Ideally you can switch between CPU and GPU by replacing Mat objects with GpuMat objects. Unfortunately it is not always possible due to some reasons and following to the CPU best practices is not always optimal for GPU.
Can I use two or more GPUs?
All the module functions and classes are single GPU aimed. But you can manually split work between several GPUs using CUDA Driver API or using MultiGpuManager class (see opencv/samples/gpu/*multi samples).
Perfomance
Why first function call is slow?
That is because of initialization overheads. On first GPU function call Cuda Runtime API is initialized implicitly. Also some GPU code is compiled (Just In Time compilation) for your video card on the first usage. So for performance measure, it is necessary to do dummy function call and only then perform time tests.
If it is critical for an application to run GPU code only once, it is possible to use a compilation cache which is persistent over multiple runs. Please read nvcc documentation for details (CUDA_DEVCODE_CACHE environment variable).
Bad Practices
Why is it bad to leave static GpuMat variables allocated?
Because this can throw exception on the deinitialization stage. GPU memory release functions throw errors if CUDA context has been released before. For the CUDA Runtime API, contexts are most likely controlled by the static variables generated by NVCC. Destruction order of static variables is undefined.
Other
Can I request new functionality?
Yes, OpenCV GPU Feature Request form
How to ask the question?
Use OpenCV Yahoo group.
What about OpenCL?
There is no OpenCL support now.