CUDA hello world in C

From ScientificComputing
Jump to: navigation, search

< Examples

Load modules

[jarunanp@eu-login-10 ~]$ env2lmod
[jarunanp@eu-login-10 ~]$ module load gcc/6.3.0 cuda/11.0.3

The following have been reloaded with a version change:
  1) gcc/4.8.5 => gcc/6.3.0

[jarunanp@eu-login-10 ~]$ which nvcc

CUDA Hello World

  • Go to $SCRATCH and create a work directory
[jarunanp@eu-login-10 ~]$ cd $SCRATCH
[jarunanp@eu-login-10 jarunanp]$ pwd
[jarunanp@eu-login-10 jarunanp]$ mkdir test_cuda
[jarunanp@eu-login-10 jarunanp]$ cd test_cuda
[jarunanp@eu-login-10 test_cuda]$ 
  • Download a CUDA Hello World example
[jarunanp@eu-login-10 test_cuda]$ wget -c -O cuda_hello.c
  • Compile the code
[jarunanp@eu-login-10 test_cuda]$ nvcc cuda_hello.c -o cuda_hello 
  • Testing the executable
[jarunanp@eu-login-10 test_cuda]$ bsub -R "rusage[ngpus_excl_p=1]" -I "./cuda_hello"
Generic job.
Job <195522896> is submitted to queue <gpu.4h>.
<<Waiting for dispatch ...>>
<<Starting on eu-g3-045>>
Hello World from GPU!
[jarunanp@eu-login-10 test_cuda]$

Using CUDA built-in variables

We have provided codes here which use the CUDA built-in variables threadIdx.x and blockIdx.x. These examples were taken from this CUDA tutorial.

  • Compile the code
[jarunanp@eu-login-10 test_cuda]$ module load gcc/6.3.0 cuda/11.0.3
[jarunanp@eu-login-10 test_cuda]$ nvcc -o vector_add_cu
  • Request an interactive session on a compute node
[jarunanp@eu-login-10 test_cuda]$ bsub -R "rusage[ngpus_excl_p=1]" -Is bash
Generic job.
Job <195523378> is submitted to queue <gpu.4h>.
<<Waiting for dispatch ...>>
<<Starting on eu-g3-039>>
FILE: /sys/fs/cgroup/cpuset/lsf/euler/job.195523378.50598.1638799736/tasks
[jarunanp@eu-g3-039 test_cuda]$
  • Profile the CUDA executable
[jarunanp@eu-g3-039 test_cuda]$ nvprof ./vector_add_cu
==112917== NVPROF is profiling process 112917, command: ./vector_add_cu
out[0] = 3.000000
==112917== Profiling application: ./vector_add_cu
==112917== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   92.57%  524.00ms         1  524.00ms  524.00ms  524.00ms  vector_add(float*, float*, float*, int)
                    4.63%  26.209ms         1  26.209ms  26.209ms  26.209ms  [CUDA memcpy DtoH]
                    2.80%  15.860ms         2  7.9298ms  7.9215ms  7.9381ms  [CUDA memcpy HtoD]

< Examples