CUDA hello world in C
From ScientificComputing
< Examples |
Load modules
[jarunanp@eu-login-10 ~]$ env2lmod [jarunanp@eu-login-10 ~]$ module load gcc/6.3.0 cuda/11.0.3 The following have been reloaded with a version change: 1) gcc/4.8.5 => gcc/6.3.0 [jarunanp@eu-login-10 ~]$ which nvcc /cluster/apps/gcc-6.3.0/cuda-11.0.3-qdlibd2luz2fy7izfefao4c5yitxwjus/bin/nvcc
CUDA Hello World
- Go to $SCRATCH and create a work directory
[jarunanp@eu-login-10 ~]$ cd $SCRATCH [jarunanp@eu-login-10 jarunanp]$ pwd /cluster/scratch/jarunanp [jarunanp@eu-login-10 jarunanp]$ mkdir test_cuda [jarunanp@eu-login-10 jarunanp]$ cd test_cuda [jarunanp@eu-login-10 test_cuda]$
- Download a CUDA Hello World example
[jarunanp@eu-login-10 test_cuda]$ wget -c https://gitlab.ethz.ch/jarunanp/hpc-examples/-/raw/main/cuda/cuda_hello.cu?inline=false -O cuda_hello.c
- Compile the code
[jarunanp@eu-login-10 test_cuda]$ nvcc cuda_hello.c -o cuda_hello
- Testing the executable
[jarunanp@eu-login-10 test_cuda]$ bsub -R "rusage[ngpus_excl_p=1]" -I "./cuda_hello" Generic job. Job <195522896> is submitted to queue <gpu.4h>. <<Waiting for dispatch ...>> <<Starting on eu-g3-045>> Hello World from GPU! [jarunanp@eu-login-10 test_cuda]$
Using CUDA built-in variables
We have provided codes here which use the CUDA built-in variables threadIdx.x and blockIdx.x. These examples were taken from this CUDA tutorial.
- Compile the code
[jarunanp@eu-login-10 test_cuda]$ module load gcc/6.3.0 cuda/11.0.3 [jarunanp@eu-login-10 test_cuda]$ nvcc vector_add.cu -o vector_add_cu
- Request an interactive session on a compute node
[jarunanp@eu-login-10 test_cuda]$ bsub -R "rusage[ngpus_excl_p=1]" -Is bash Generic job. Job <195523378> is submitted to queue <gpu.4h>. <<Waiting for dispatch ...>> <<Starting on eu-g3-039>> FILE: /sys/fs/cgroup/cpuset/lsf/euler/job.195523378.50598.1638799736/tasks [jarunanp@eu-g3-039 test_cuda]$
- Profile the CUDA executable
[jarunanp@eu-g3-039 test_cuda]$ nvprof ./vector_add_cu ==112917== NVPROF is profiling process 112917, command: ./vector_add_cu out[0] = 3.000000 PASSED ==112917== Profiling application: ./vector_add_cu ==112917== Profiling result: Type Time(%) Time Calls Avg Min Max Name GPU activities: 92.57% 524.00ms 1 524.00ms 524.00ms 524.00ms vector_add(float*, float*, float*, int) 4.63% 26.209ms 1 26.209ms 26.209ms 26.209ms [CUDA memcpy DtoH] 2.80% 15.860ms 2 7.9298ms 7.9215ms 7.9381ms [CUDA memcpy HtoD]
< Examples |