clone 下來,跟 caffe-jacindo 在同一層:
./ | -- caffe-jacindo | | -- caffe-jacindo-modelsclone 後,checkout 跟 caffe-jacindo 一樣的版本(branch): caffe-0.17
設定環境變數:
export PYTHONPATH=~/caffe-jacinto/python export CAFFE_ROOT=~/caffe-jacinto這樣就裝完了。
接下來依照需要做的 example 來 run script..
ref:
SSD
scripts/train_image_object_detection.sh 就是用 caffe-ssd 的training data 做 SSD training 的 script。
可以知道,他用 caffe-ssd 的 training data:
if [ $dataset = "voc0712" ] then train_data="../../caffe-jacinto/examples/VOC0712/VOC0712_trainval_lmdb" test_data="../../caffe-jacinto/examples/VOC0712/VOC0712_test_lmdb" name_size_file="../../caffe-jacinto/data/VOC0712/test_name_size.txt" label_map_file="../../caffe-jacinto/data/VOC0712/labelmap_voc.prototxt"所以要先 follow caffe-ssd 的步驟,
download training data 到 ~/data
在 caffe-jacinto 中run create_data.sh, create_list.sh 把 VOC0712 的 data set 準備好。
chmod a+x data/VOC0712/*.sh ./data/VOC0712/create_list.sh ./data/VOC0712/create_data.sh然後到 caffe-jacinto-models 下 run train_image_object_detection.sh. 根據 train_image_object_detection.sh 的內容,要在 scripts 目錄下執行。
結果:
2019-07-26 11:39:44 (10.2 MB/s) - ‘training/imagenet_jacintonet11v2_iter_320000.caffemodel’ saved [11516054/11516054] Logging output to training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/train-log_20190726_11-39.txt training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/initial Traceback (most recent call last): File "./models/image_object_detection.py", line 5, in好像是 module path 的問題。from models.model_libs import * ImportError: No module named models.model_libs training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/l1reg Traceback (most recent call last): File "./models/image_object_detection.py", line 5, in from models.model_libs import * ImportError: No module named models.model_libs training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/sparse Traceback (most recent call last): File "./models/image_object_detection.py", line 5, in from models.model_libs import * ImportError: No module named models.model_libs training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test Traceback (most recent call last): File "./models/image_object_detection.py", line 5, in from models.model_libs import * ImportError: No module named models.model_libs training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test_quantize Traceback (most recent call last): File "./models/image_object_detection.py", line 5, in from models.model_libs import * ImportError: No module named models.model_libs cat: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test_quantize/deploy.prototxt: No such file or directory cat: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test_quantize/test.prototxt: No such file or directory ./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/initial/run.sh: No such file or directory ./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/l1reg/run.sh: No such file or directory ./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/sparse/run.sh: No such file or directory ./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test/run.sh: No such file or directory ./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test_quantize/run.sh: No such file or directory
image_object_detection.py 在 import 同目錄的 py module時,都加上了models。
把models 刪掉就沒有這個 Error 了。
然後是 Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
一樣,把 scripts 下所有*.sh, *.py 中的 gpus= 找出來,都改成 gpus="0" 就可以了。
例如:
diff --git a/scripts/train_image_object_detection.sh b/scripts/train_image_object_detection.sh index 0e5f99d..f9771f4 100755 --- a/scripts/train_image_object_detection.sh +++ b/scripts/train_image_object_detection.sh @@ -5,7 +5,7 @@ DATE_TIME=`date +'%Y%m%d_%H-%M'` #------------------------------------------------------- #------------------------------------------------ -gpus="0,1" #"0,1,2" +gpus="0" #"0,1,2" diff --git a/scripts/models/image_object_detection.py b/scripts/models/image_object_detection.py index 67823a1..f625615 100644 --- a/scripts/models/image_object_detection.py +++ b/scripts/models/image_object_detection.py @@ -360,7 +360,7 @@ def main(): # Which layers to freeze (no backward) during training. config_param.freeze_layers = [] # Defining which GPUs to use. - config_param.gpus = "0,1" #gpus = "0" + config_param.gpus = "0" #gpus = "0" config_param.batch_size = 32 config_param.accum_batch_size = 32
我把修改過的放到 https://github.com/checko/caffe-jacinto-models/tree/zoey-wpc
還有要注意 GPU 的 memory。
train_image_object_detection.sh 中的 batch_size 是 16,給 8G 的 GPU card 剛好
如果 GPU memory 只有 4G,要改成 8