2018/6/25

caffe , cuda

follow caffe installation guide, ubuntu (>17.04)
sudo apt-get install caffe-cuda

這好像只有 install binary & libary。其他部份好像還是要 build from source。
利用 apt src 的 build-dep 自動把 build caffe-cuda 需要的 package 安裝起來:
這個command 須要先把sources.list 中 deb-src un-comment 掉..
他會安裝 gcc6
sudo apt build-dep caffe-cuda
然後就可以 git clone https://github.com/BVLC/caffe.git
然後依照 說明。 copy Makefile.config.example Makefile.config 來修改。
如果是用 cuda + cpu,就都不用改。
cp Makefile.config.example Makefile.config
# Adjust Makefile.config (for example, if using Anaconda Python, or if cuDNN is desired)
make all
make test
make runtest
但是依照這一篇,要un comment USE_PKG_CONFIG := 1

然後 make all 就出現問題: Unsupported gpu architecture 'compute_20'
參考這一篇
After more research I found that the newest cuda version (9.0) doesn't support compute_20 anymore. 
This means that you have two options, disable the compute_20 target or install cuda version 8.0. 
If your GPU supports newer compute architectures you should use the newest cuda version and disable compute_20.
果然,在 Makefile.config 中有...
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
# For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
# For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
                -gencode arch=compute_20,code=sm_21 \
                -gencode arch=compute_30,code=sm_30 \
                -gencode arch=compute_35,code=sm_35 \
                -gencode arch=compute_50,code=sm_50 \
                -gencode arch=compute_52,code=sm_52 \
                -gencode arch=compute_60,code=sm_60 \
                -gencode arch=compute_61,code=sm_61 \
                -gencode arch=compute_61,code=compute_61

接著是 hdf5.h : No such file or directory Error。
參考這一篇,有一些 make caffe 的問題解決方法。
修改 Makefile.config:
--- INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
+++ INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

之後是 cannot find -lhdf5_hl, -lhdf5 Error..
一樣,剛剛的link 說..修改 Makefile
--- LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
+++ LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial
OK

接著就要看 A step by step guide to Caffe Training LeNet on MNIST with Caffe

出現 opencv error:
.build_release/lib/libcaffe.so: undefined reference to `cv::imread(cv::String const&, int)
但是又真的有裝 libopencv 的話。
可以改一下 ..Make.config,
 # Uncomment to use `pkg-config` to specify OpenCV library paths.
 # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
-# USE_PKG_CONFIG := 1
+USE_PKG_CONFIG := 1

如果要 run python/ 下的 tool 的話,還要安裝一些 python module,寫在 python/requirement.txt:
sudo pip install -r requirement.txt


在 make runtest 時出現錯誤:
Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version
發現是因為 /usr/share/cuda link 到 cuda-10.1,但是 nvidia-smi 顯示的 cuda 版本卻是 10.0
代表 driver 和 library 不match (大概是apt 更新的)。
所以重新把 /usr/share/cuda link 到 cuda-10.0 之後就沒問題了。

沒有留言:

張貼留言