/usr/src/cudnn_sample_v8
cuda 安裝在 /usr/local/cuda
copy sample 到 home folder
到 conv_sample 下
~/cudnn_samples_v8/conv_sample$ make CUDA_VERSION is 10020 Linking agains cublasLt = true CUDA VERSION: 10020 TARGET ARCH: x86_64 TARGET OS: linux SMS: 35 50 53 60 61 62 70 72 75 /usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -I/usr/local/cuda/targets/ppc64le-linux/include -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o fp16_dev.o -c fp16_dev.cu g++ -I/usr/local/cuda/include -I/usr/local/cuda/targets/ppc64le-linux/include -o fp16_emu.o -c fp16_emu.cpp g++ -I/usr/local/cuda/include -I/usr/local/cuda/targets/ppc64le-linux/include -o conv_sample.o -c conv_sample.cpp /usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o conv_sample fp16_dev.o fp16_emu.o conv_sample.o -I/usr/local/cuda/include -I/usr/local/cuda/targets/ppc64le-linux/include -L/usr/local/cuda/lib64 -L/usr/local/cuda/targets/ppc64le-linux/lib -lcublasLt -lcudart -lcublas -lcudnn -lstdc++ -lm測試一下...
~/cudnn_samples_v8/conv_sample$ ./conv_sample Executing: conv_sample Using format CUDNN_TENSOR_NCHW (for INT8x4 and INT8x32 tests use CUDNN_TENSOR_NCHW_VECT_C) Testing single precision ====USER DIMENSIONS==== input dims are 1, 32, 4, 4 filter dims are 32, 32, 1, 1 output dims are 1, 32, 4, 4 ====PADDING DIMENSIONS==== padded input dims are 1, 32, 4, 4 padded filter dims are 32, 32, 1, 1 padded output dims are 1, 32, 4, 4 Testing conv ^^^^ CUDA : elapsed = 7.00951e-05 sec, Test PASSED Testing half precision (math in single precision) ====USER DIMENSIONS==== input dims are 1, 32, 4, 4 filter dims are 32, 32, 1, 1 output dims are 1, 32, 4, 4 ====PADDING DIMENSIONS==== padded input dims are 1, 32, 4, 4 padded filter dims are 32, 32, 1, 1 padded output dims are 1, 32, 4, 4 Testing conv ^^^^ CUDA : elapsed = 3.69549e-05 sec, Test PASSED
沒有留言:
張貼留言