CUDA is a parallel computing model, architectural API model for GPU. It supports multiple abstractions to handle NVIDIA GPU.
Following functions can be used for CUDA programming. This post will be updated later with more information.
Memory allocation
1 |
|
This instruction allocates some memory space to GPU with size. It will return address of memory space to devPtr. If it fails, it will return some values as the return of function.
Memory deallocation
1 |
|
This instruction deallocates memory in devPtr on GPU. If it fails, it will return some values as the return of function.
Memory copy
1 |
|
This instruction copies memory from CPU memory to GPU memory or between them. It copies memory from src to dst with size of count. It has 5 kinds of data transfer and following is that.
- cudaMemcpyHostToHost : Copy memory between CPU memory space.
- cudaMemcpyHostToDevice : Copy CPU memory space to GPU memory space.
- cudaMemcpyDeviceToHost : Copy GPU memory space to CPU memory space.
- cudaMemcpyDeviceToDevice : Copy memory between GPU memory space.
- cudaMemcpyDefault : Automatically transfer the data. It requires to be unified memory.
kernel call
To run the kernel of GPU, you need to use it like below. Notice that kernel call passes three parameters.
1 |
|
Example
With the function above, it needs some kernel function to be used. Example code is like below.
1 |
|
Notice that this can be run in parallel with CPU at the same time.