High Altitude Oolong: Nvidia DIGITS.

因為nvidia/caffe 關聯到 DIGITS. 所以就用用看..
文件用的是 nvidia 舊版 docker，
新版的試試看:

~$ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility -itd -p 5000:5000 -v ~/dockerfolder:/dockerfolder nvidia/digits

然後browser 開啟 ip:5000，果然出現了...
新的 image 好像要從 NGC 拿，有點麻煩...

dataset 要自己 download，然後在 web 界面指定 dataset folder，其他都可以在web page 上完成。
參考這一篇，或是官方，有比較清楚 step by step 的操作。

因為在 client PC 上的browser 要 upload image，所以 run image 時還是指定 volume mapping 好了..

官方有一些 trainning 的 example。
也有semantic segmentation 的 example

照著 create VOC dataset 時，卻一直都是 0
export log 來看。一開始有..

libdc1394 error: Failed to initialize libdc1394
2019-11-08 09:03:39 [INFO ] Created features db for stage train_db in train_db/features
2019-11-08 09:03:42 [INFO ] Created labels db for stage train_db in train_db/labels
2019-11-08 09:03:42 [INFO ] Processed 1/46
2019-11-08 09:03:42 [INFO ] Processed 2/46
2019-11-08 09:03:42 [INFO ] Processed 3/46

然後找到..這一篇：

＃ ln /dev/null /dev/raw1394

log 就沒有出現 error了，但是頁面顯示的還是 size 0.
切換到 datasets tag，點 VOC 開啟頁面，就不是 0 了，follow intruction export db，可以看到image 了...

接著 run script 產生 pretrained parameter (caffemodel).
customize network (剪下-貼上)
train 時出現 error:

ERROR: Cannot copy param 0 weights from layer 'fc6'; 
shape mismatch. Source param shape is 1 1 4096 9216 (37748736); 
target param shape is 4096 256 6 6 (37748736). 
To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

google 的答案，指向 stackoverflow:

Rename the layer from "loss1/classifier" to "loss1/classifier_retrain".

When fine-tuning a model, here's what Caffe does:

# pseudo-code
for layer in new_model:
  if layer.name in old_model:
    new_model.layer.weights = old_model.layer.weights

You're getting an error because the weights for "loss1/classifier" were for a 1000-class classification problem (1000x1024), 
and you're trying to copy them into a layer for a 6-class classification problem (6x1024). 
When you rename the layer, Caffe doesn't try to copy the weights for that layer and you get randomly initialized weights - which is what you want.

結果完全不是這麼回事...
純粹是因為在把alexnet pretrained model 轉為 fc-alexnet pretrained model 時，要 run net_surgery.sh
script 會去 download bvlc_alexnet.XX，然後要參考 fcn_alexnet.deploy.prototxt
這個 fcn_alexnet.deploy.prototxt 在 DIGITS 的 example/sematic-segmentation 中，所以要 clone 下來，copy 到 run net_surgery.sh 的目錄。
這樣才會正確產生 fcn_alexnet.caffemodel

在 github 的docker 版本只有到 16 版，所以example 中的 GAN model 沒有 install
參考nvidia deep learning digits documentation 中的 repo，有道 19.0 版。

依照說明..

docker pull nvcr.io/nvidia/digits:19.10-tensorflow

然後follow 上面的啟動..

~$ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility -itd -p 5000:5000 -v ~/dockerfolder:/dockerfolder nvcr.io/nvidia/digits:19.10-tensorflow

然後開啟 browser，port 5000，進入 digits web 話面，右邊 image new 就有 GAN 可以選了

High Altitude Oolong

2019/11/8

Nvidia DIGITS.

沒有留言:

張貼留言