Hands-On GPU:Accelerated Computer Vision with OpenCV and CUDA
上QQ阅读APP看书,第一时间看更新

CUDA API functions

In the variable addition program, we have encountered some functions or keywords that are not familiar to regular C or C++ programmers. These keywords and functions include __global__  , cudaMalloc, cudaMemcpy, and cudaFree. So, in this section, these functions are explained in detail one by one:

  • __global__ : It is one of three qualifier keywords, along with __device__  and __host__ . This keyword indicates that a function is declared as a device function and will execute on the device when called from the host. It should be kept in mind that this function can only be called from the host. If you want your function to execute on the device and called from the device function, then you have to use the __device__ keyword. The __host__  keyword is used to define host functions that can only be called from other host functions. This is similar to normal C functions. By default, all functions in a program are host functions. Both __host__ and __device__ can be simultaneously used to define any function. It generates two copies of the same function. One will execute on the host, and the other will execute on the device.

 

  • cudaMalloc: It is similar to the Malloc function used in C for dynamic memory allocation. This function is used to allocate a memory block of a specific size on the device. The syntax of cudaMalloc with an example is as follows:
cudaMalloc(void ** d_pointer, size_t size)
Example: cudaMalloc((void**)&d_c, sizeof(int));

As shown in the preceding example code, it allocates a memory block of size equal to the size of one integer variable and returns the pointer d_c, which points to this memory location. 

  • cudaMemcpy: This function is similar to the Memcpy function in C. It is used to copy one block of memory to other blocks on a host or a device. It has the following syntax:
cudaMemcpy ( void * dst_ptr, const void * src_ptr, size_t size, enum cudaMemcpyKind kind )
Example: cudaMemcpy(&h_c, d_c, sizeof(int), cudaMemcpyDeviceToHost);

This function has four arguments. The first and second arguments are the destination pointer and the source pointer, which point to the host or device memory location. The third argument indicates the size of the copy and the last argument indicates the direction of  the copy. It can be from host to device, device to device, host to host, or device to host. But be careful, as you have to match this direction with the appropriate pointers as the first two arguments. As shown in the example, we are copying a block of one integer variable from the device to the host by specifying the device pointer d_c as the source, and the host pointer h_c as a destination.

  •  cudaFree: It is similar to the free function available in C. The syntax of cudaFree is as follows:
cudaFree ( void * d_ptr )
Example: cudaFree(d_c)

It frees the memory space pointed to by d_ptr. In the example code, it frees the memory location pointed to by d_c. Please make sure that d_c is allocated memory, using  cudaMalloc to free it using cudaFree.

There are many other keywords and functions available in CUDA over and above existing ANSI C functions. We will be frequently using only these three functions, and hence they are discussed in this section. For more details, you can always visit the CUDA programming guide.