Hands-On GPU：Accelerated Computer Vision with OpenCV and CUDA

上QQ阅读APP看书，第一时间看更新

Local memory and registers

Local memory and register files are unique to each thread. Register files are the fastest memory available for each thread. When variables of the kernel do not fit in register files, they use local memory. This is called register spilling. Basically, local memory is a part of global memory that is unique for each thread. Access to local memory will be slow compared to register files. Though local memory is cached in L1 and L2 caches, register spilling might not affect your program adversely.

A simple program to understand how to use local memory is shown as follows:

#include <stdio.h>
#define N 5

__global__ void gpu_local_memory(int d_in)
{
  int t_local; 
  t_local = d_in * threadIdx.x; 
  printf("Value of Local variable in current thread is: %d \n", t_local);
}
int main(int argc, char **argv)
{
  printf("Use of Local Memory on GPU:\n");
  gpu_local_memory << <1, N >> >(5); 
  cudaDeviceSynchronize();
  return 0;
}

The t_local variable will be local to each thread and stored in a register file. When this variable is used for computation in the kernel function, the computation will be the fastest. The output of the preceding code is shown as follows: