2019/12/24

使用 docker 執行 openmvide

docker run -it --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v /dev:/dev openmv/v2
build openmvide from source https://github.com/openmv/openmv-ide
sudo apt-get install libqt5serialport5-dev qtdeclarative5-dev qtbase5-private-dev qttools5-dev-tools chrpath p7zip-full
make.py fail
Everything is Ok
python -u /data/openmv-ide/qt-creator/scripts/packageIfw.py -i "None" -v 2.2.0 -a "/data/openmv-ide/build/openmv-ide-linux-x86_64-2.2.0-installer-archive.7z" openmv-ide-linux-x86_64-2.2.0 && python -u /data/openmv-ide/qt-creator/scripts/sign.py "openmv-ide-linux-x86_64-2.2.0.run"
Cleaning up...
Done.
Traceback (most recent call last):
  File "/data/openmv-ide/qt-creator/scripts/packageIfw.py", line 155, in 
    main()
  File "/data/openmv-ide/qt-creator/scripts/packageIfw.py", line 147, in main
    subprocess.check_call(ifw_call, stderr=subprocess.STDOUT)
  File "/usr/lib/python2.7/subprocess.py", line 536, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 523, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
Makefile:335: recipe for target 'installer' failed
make: *** [installer] Error 1
Make Failed...
make.py 其中 make() 的 linux 部份:
       # Build...
        if os.system("cd " + builddir +
        " && qmake ../qt-creator/qtcreator.pro -r -spec linux-g++" +
        " && make -r -w -j" + str(cpus) +
        " && make installer INSTALL_ROOT="+installdir + " IFW_PATH="+str(ifdir)):
            sys.exit("Make Failed...")
        installer = glob.glob(os.path.join(builddir, "openmv-ide-*.run"))[0]
找一下 click connect 的 source code..
$grep -r 'No OpenMV Cams found' *
openmv/openmvplugin.cpp:                    tr("No OpenMV Cams found!"));
qtserialport 的 souces : https://stackoverflow.com/questions/34758196/trouble-using-qserialport-in-ubuntu

修改 openmvplugin.cpp,加上 debug message, 有關 portName() QtString 轉 c string, ref:https://stackoverflow.com/questions/5505221/converting-qstring-to-char
就可以用 printf()印debug message。
發現tty serial 都有正確找到,但是 VID 都是 No VID,
最後..在 container 內..
udevadm trigger
啟動 docker command 變成:
docker run -it --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v /dev:/dev -v ~/ddata:/data -v /etc/udev:/etc/udev -v /sys:/sys openmv/v2:packageok

重新做一次:

docker run -it --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v /dev:/dev -v ~/ddata:/data   ubuntu:18.04
然後adduser..
adduser uid=1000 charles-chang
安裝 package..
apt-get update
apt-get install sudo vim
然後修改 /etc/group,把 charles-chang 加到 sudo group 中.. su - charles-chang 換到 charles-chang
接著要裝 run openmvide 的 package,先install add ppa 需要的 command,加入arm embedded , update..
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:team-gcc-arm-embedded/ppa
sudo apt-get update
安裝需要的 packages..
sudo apt-get install gcc-arm-embedded libc6-i386 python2.7 python-dev python-pip libusb-1.0-0 libusb-1.0-0-dev python-gtksourceview2 git  make
用 pip install python modules
sudo pip install numpy pyserial==2.7 pyusb==1.0.0b2 Pillow
安裝openmvide 要的 pakcages..
sudo apt install imagemagick libqt5widgets5 libqt5gui5 libqt5core5a qt5-default qt5-qmake wget g++ libgl1-mesa-dev
download openmvide install file to .. mapped volume.. /data

chmod +x, run, 出現 dialog,安裝目錄改 /data,出現authorization required sudo,輸入 密碼。OK
dialog 出現..disk space 0.0..



結果是要手動啟動 /lib/systemd/systemd-udevd
加上 & run as daemon,之後 usb 的 in/out 才會 update....

然後還要執行 udevadm trigger,這樣 commit 的 image 啟動後 sytemd-udev 才會繼續有動作。(不然又要手動啟動 systemd-udevd)

所以...安裝 systemd-udevd
apt-get install udev
這樣才有 systemd-udevd 和 udevadm

另外..

必須要 mapping(-v) /sys 進 container 後,以 user (不是root)的身份啟動 systemd-udevd 和 udevadm trigger 之後, commit 的 image ,之後 run 起來才會自動發揮作用,不用手動再啟動 systemd-udevd...
--- 猜是經由 /sys 使用 host 的 udevd 動作..



final..

用下面命令啟動 docker..
docker run -it --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v /dev:/dev -v ~/ddata:/data  -v /sys:/sys ubuntu
host 的 .X11-unix 有 檢查user permission,所以在 container 中也要鍵一格 uid/gid 一樣的 user

裝完所有 package,讓 openmvide 能正常啟動後..
sudo apt-get install udev
sudo /lib/systemd/systemd-udevd &
sudo udevadm trigger
可以測試一下 openmvide 是不是可以正常偵測 openmv. OK 的話, exit container,之後就可以 commit 這個 image..
以後就用 上面的 run command 啟動這個 image 就可以了, 啟動後不用再叫起 systemd-udev

啟動 openmvide 之前,要 su 成剛剛create 的 user,因為uid,gid 要跟 host 的 user 一樣。
只要 su,不要切換環境,不然 DISPLAY 變數也要重新 export

2019/12/10

unzip 簡體檔名亂碼

因為是 GBK 的關係,要轉成對應的 utf8 才能正常顯示。

最後是用 這一篇 的方法...
LANG=C 7za x your-zip-file.zip
convmv -f GBK -t utf8 --notest -r .
試過,用 unzip 的話,LANG=C 沒效。

同理,中文的話 GBK改 BIG5

2019/12/5

! The argument `showcommands` is no longer supported.
! Instead, the verbose log is always written to a compressed file in the output dir:
!
!   gzip -cd out/verbose.log.gz | less -R
!

2019/11/26

'+' mark at the end of file attribute

drwxrwxr-x  30 aiden aiden      4096 Nov 24 22:18 tmp
drwxrwxr-x+ 30 aiden aiden      4096 Nov 25 00:30 aosp
-rw-r-----+  1 aiden aiden         0 Nov 25 14:39 1234.txt
像這樣,有些目錄/檔案,在 attrib 最後是 '+'

根據 + or @ after ls -al:
The + suffix means the file has an access control list, and is common in any *nix that supports ACLs. Giving ls the -e flag will make it show the associated ACLs after the file, and chmod can be used to modify then. Most of this is from the chmod man page:

2019/11/25

mkfs.fat 4.1 -- reseved sector = 1

ref:fat filesystem
Name             Offset
---------------------------
JmpBoot       :  0
OEMName       :  3
Bytes Per Sec : 11
Sec Per Clust : 13
Reserv Sec Cnt: 14
Num FATs      : 16
所以 Reserved Sector Count 在第 14 byte:

Number of sectors in reserved area. This field must not be 0 because there is the boot sector itself contains this BPB in the reserved area. To avoid compatibility problems, it should be 1 on FAT12/16 volume. This is because some old FAT drivers ignore this field and assume that the size of reserved area is 1. On the FAT32 volume, it is typically 32. Microsoft's OS properly supports any value of 1 or larger.
fat16copy.py,開image 檔的部份:
   f = open(path, "r+b")

    self.f = f

    f.seek(0xb)
    bytes_per_sector = read_le_short(f)
    print(bytes_per_sector);
    sectors_per_cluster = read_byte(f)
    print(sectors_per_cluster);

    self.bytes_per_cluster = bytes_per_sector * sectors_per_cluster

    reserved_sectors = read_le_short(f)
    print(reserved_sectors);
    assert reserved_sectors == 1, \
        "Can only handle FAT with 1 reserved sector"

reserved_sector 是 ..0x0b+2(short)+1(byte) = 0x0e, 沒錯。

用 hd 配合mkfs.fat 看看..
~$ mkfs.fat 500M
mkfs.fat 4.1 (2017-01-24)
charles-chang@zoeymkII:~$ hd -n 16 500M 
00000000  eb 3c 90 6d 6b 66 73 2e  66 61 74 00 02 10 10 00  |.<.mkfs.fat.....|
00000010
charles-chang@zoeymkII:~$ mkfs.fat -a -R1 500M
mkfs.fat 4.1 (2017-01-24)
charles-chang@zoeymkII:~$ hd -n 16 500M 
00000000  eb 3c 90 6d 6b 66 73 2e  66 61 74 00 02 10 01 00  |.<.mkfs.fat.....|
00000010
所以mkfs.fat 4.1 版,加上參攝 -a -R1 這兩個 option:
  • -a : disable alian
  • -R1 : reserved sector = 1
可以讓 4.1 版也正確做出 reserved_sector=1 的 FAT image

用 fat16copy.py 測試 OK

2019/11/20

build android P for rpi3

依照這個 來做,試試 rpi 3 上 android P

android sources from google.(https://android.googlesource.com/platform/manifest)
依照著clone device/brobwind/rpi3,增加local_manifest, repo sync..
repo init -u https://android.googlesource.com/platform/manifest -b android-9.0.0_r50
repo sync

出現 mkfs.fat Error , 據說是 4.1 版的問題,所以..
build/make$ git diff
diff --git a/tools/fat16copy.py b/tools/fat16copy.py
index c20930a47..18541e88a 100755
--- a/tools/fat16copy.py
+++ b/tools/fat16copy.py
@@ -465,8 +465,8 @@ class fat(object):
     self.bytes_per_cluster = bytes_per_sector * sectors_per_cluster
 
     reserved_sectors = read_le_short(f)
-    assert reserved_sectors == 1, \
-        "Can only handle FAT with 1 reserved sector"
+#    assert reserved_sectors == 1, \
+#        "Can only handle FAT with 1 reserved sector"
 
     fat_count = read_byte(f)
     assert fat_count == 2, "Can only handle FAT with 2 tables"
==> 這樣是沒效的,詳細要參考這邊

然後還有... No module named mako.template
就 apt-get install python-mako

out 目錄用 network , loopback device. clean build 時間:
#### build completed successfully (02:48:25 (hh:mm:ss)) ####
out 目錄總共用掉 54G

因為 out 目錄是 loopback device (disk),所以在 nfs export 時,要加上nohide,crossmnt

燒錄 (SD Card)

到 out/target/product/rpi3 下...
out/target/product/rpi3$ sudo OUT=. ~/zoeymkiihome/pi3p/device/brobwind/rpi3/boot/create_partition_table.sh /dev/sdb
 => Destroy partition table ...

***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory. 
***************************************************************

GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
 => Install GPT partition table ...
 => Install hybrid MBR partition table ...
 => Install images ....
     => Install: rpiboot(./rpiboot.img) image ...
131072+0 records in
131072+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 17.1989 s, 3.9 MB/s
     => Install: boot_a(./boot.img) image ...
39008+0 records in
39008+0 records out
19972096 bytes (20 MB, 19 MiB) copied, 5.31937 s, 3.8 MB/s
     => Install: system_a(./system.img) image ...
1331200+0 records in
1331200+0 records out
681574400 bytes (682 MB, 650 MiB) copied, 152.134 s, 4.5 MB/s
     => Install: misc(/home/charles-chang/zoeymkiihome/pi3p/device/brobwind/rpi3/boot/images/misc.img) image ...
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.336952 s, 3.1 MB/s
     => Install: vendor_a(./vendor.img) image ...
524288+0 records in
524288+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 69.0102 s, 3.9 MB/s
     => Install: oem_bootloader_a(/home/charles-chang/zoeymkiihome/pi3p/device/brobwind/rpi3/boot/images/oem_bootloader_a.img) image ...
8192+0 records in
8192+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 1.17575 s, 3.6 MB/s
     => Install: userdata(/home/charles-chang/zoeymkiihome/pi3p/device/brobwind/rpi3/boot/images/zero_4k.bin) image ...
8+0 records in
8+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0298523 s, 137 kB/s
 => Dump partition table ....
Disk /dev/sdb: 15564800 sectors, 7.4 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): A395EE6C-31A1-4A03-BE8E-6A65F0700662
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 15564766
Partitions will be aligned on 8-sector boundaries
Total free space is 6 sectors (3.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              40          131111   64.0 MiB    FFFF  rpiboot
   2          131112          133159   1024.0 KiB  FFFF  uboot_a
   3          133160          135207   1024.0 KiB  FFFF  uboot_b
   4          135208          200743   32.0 MiB    FFFF  boot_a
   5          200744          266279   32.0 MiB    FFFF  boot_b
   6          266280         1597479   650.0 MiB   FFFF  system_a
   7         1597480         2928679   650.0 MiB   FFFF  system_b
   8         2928680         2928807   64.0 KiB    FFFF  vbmeta_a
   9         2928808         2928935   64.0 KiB    FFFF  vbmeta_b
  10         2928936         2930983   1024.0 KiB  FFFF  misc
  11         2930984         3455271   256.0 MiB   FFFF  vendor_a
  12         3455272         3979559   256.0 MiB   FFFF  vendor_b
  13         3979560         3987751   4.0 MiB     FFFF  oem_bootloader_a
  14         3987752         3995943   4.0 MiB     FFFF  oem_bootloader_b
  15         3995944         4000039   2.0 MiB     FFFF  frp
  16         4000040         4786471   384.0 MiB   FFFF  swap
  17         4786472        15564766   5.1 GiB     FFFF  userdata
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A B4DDDDC3-FF83-4D95-91DC-4999ADB836DF rpiboot
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A E88B3641-48BB-4D5F-892B-A08B075E6E9F uboot_a
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A 8452CDDF-4B7C-42FA-ACB9-1B279553D720 uboot_b
PART: BB499290-B57E-49F6-BF41-190386693794 4ECCB503-5551-490F-B5D5-9D0BDCAC95D7 boot_a
PART: BB499290-B57E-49F6-BF41-190386693794 E579E168-002E-453B-8161-BE9FE76B1390 boot_b
PART: 0F2778C4-5CC1-4300-8670-6C88B7E57ED6 89313297-5363-4E37-B15E-AC87A1F19379 system_a
PART: 0F2778C4-5CC1-4300-8670-6C88B7E57ED6 28A6E00E-C1C1-481A-A087-142A8933B7C8 system_b
PART: B598858A-5FE3-418E-B8C4-824B41F4ADFC 4A773582-3F2D-4ADD-9AF1-4FAD6D78BB7A vbmeta_a
PART: B598858A-5FE3-418E-B8C4-824B41F4ADFC F60C759C-D9EC-4A65-8A24-3DEB5C214204 vbmeta_b
PART: 6B2378B0-0FBC-4AA9-A4F6-4D6E17281C47 BEB9C837-9DF1-4190-BED5-5F9DAD2AF268 misc
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A 1EBD7A2C-2D8C-4CF2-BDED-535219A036AB vendor_a
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A 2039ACAF-EE39-45E2-A02A-5C5927C5AC43 vendor_b
PART: AA3434B2-DDC3-4065-8B1A-18E99EA15CB7 39E6FD6B-58EB-4FB1-BDC5-378B8D888A5E oem_bootloader_a
PART: AA3434B2-DDC3-4065-8B1A-18E99EA15CB7 BB63D919-A323-477E-B4ED-DB62C707C6C0 oem_bootloader_b
PART: AA3434B2-DDC3-4065-8B1A-18E99EA15CB7 AD200506-D3FC-4CB2-AADD-0E1BDABEFB1C frp
PART: AA3434B2-DDC3-4065-8B1A-18E99EA15CB7 59501E98-7479-4BBF-82B6-0585BBFE83EB swap
PART: 0BB7E6ED-4424-49C0-9372-7FBAB465AB4C 9037D9DF-4E2A-4E1F-8C4E-ADBDDBBF2F71 userdata
開機失敗,發現第一個 partition 沒有東西,把 out/.../rpiboot.img mount 起來看,也沒有東西..

18:04:05 prebuilts/build-tools/linux-x86/bin/ninja 
[prebuilts/build-tools/linux-x86/bin/ninja -d keepdepfile rpibootimage -j 10 -f out/combined-rpi3.ninja -v -w dupbuild=err]
[100% 1/1] 
/bin/bash -c "(echo \"Target rpiboot fs image: out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img\" ) 
&& (mkdir -p out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/ ) 
&& (dd if=/dev/zero of=out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img bs=\$((1024*1024)) count=64 ) 
&& (mkfs.fat -n \"rpiboot\" out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img ) 
&& (for item in  out/target/product/rpi3/rpiboot/bootcode.bin out/target/product/rpi3/rpiboot/fixup_cd.dat 
out/target/product/rpi3/rpiboot/fixup.dat 
out/target/product/rpi3/rpiboot/fixup_db.dat 
out/target/product/rpi3/rpiboot/fixup_x.dat 
out/target/product/rpi3/rpiboot/start_cd.elf 
out/target/product/rpi3/rpiboot/start_db.elf 
out/target/product/rpi3/rpiboot/start.elf 
out/target/product/rpi3/rpiboot/start_x.elf 
out/target/product/rpi3/rpiboot/issue.txt 
out/target/product/rpi3/rpiboot/LICENCE.broadcom 
out/target/product/rpi3/rpiboot/LICENSE.oracle 
out/target/product/rpi3/rpiboot/SHA1SUM 
out/target/product/rpi3/rpiboot/cmdline.txt 
out/target/product/rpi3/rpiboot/config.txt 
out/target/product/rpi3/rpiboot/u-boot-dtok.bin 
out/target/product/rpi3/rpiboot/uboot.env 
out/target/product/rpi3/rpiboot/overlays/chosen-serial0.dtbo 
out/target/product/rpi3/rpiboot/overlays/rpi-uart-skip-init.dtbo 
out/target/product/rpi3/rpiboot/bcm2710-rpi-3-b.dtb 
out/target/product/rpi3/rpiboot/bcm2710-rpi-3-b-plus.dtb 
out/target/product/rpi3/rpiboot/overlays/bcm2710-rpi-3-b-android-fstab.dtbo 
out/target/product/rpi3/rpiboot/overlays/bcm2710-rpi-3-b-cpufreq.dtbo 
out/target/product/rpi3/rpiboot/overlays/pwm-2chan.dtbo 
out/target/product/rpi3/rpiboot/overlays/sdtweak.dtbo 
out/target/product/rpi3/rpiboot/overlays/vc4-kms-v3d.dtbo; 
do if [ \"\`dirname \${item}\`\" = \"out/target/product/rpi3/rpiboot\" ] ;
 then build/make/tools/fat16copy.py out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img \${item} ; 
fi ; done ) 
&& (for item in overlays; do build/make/tools/fat16copy.py 
out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img 
out/target/product/rpi3/rpiboot/\${item} ; done ) 
&& (echo \"Install rpiboot fs image: out/target/product/rpi3/rpiboot.img\" ) 
&& (prebuilts/build-tools/linux-x86/bin/acp out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img out/target/product/rpi3/rpiboot.img )"
Target rpiboot fs image: out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img
手動把 sd card partition1 mount 起來,把 out/.../rpiboot/ 下的檔案都 copy 過去,就可以開機了。
所以猜是 fat16copy.py 的修改失敗了。

看看mkfs.fat 的 option 有沒有有關 reserved_sector ...
mkfs.fat 4.1 (2017-01-24)
No device specified.
Usage: mkfs.fat [-a][-A][-c][-C][-v][-I][-l bad-block-file][-b backup-boot-sector]
       [-m boot-msg-file][-n volume-name][-i volume-id]
       [-s sectors-per-cluster][-S logical-sector-size][-f number-of-FATs]
       [-h hidden-sectors][-F fat-size][-r root-dir-entries][-R reserved-sectors]
       [-M FAT-media-byte][-D drive_number]
       [--invariant]
       [--help]
       /dev/name [blocks]
也順便看一下 build/make/tools/fat16copy.py,其中 main:
    print("Usage: fat16copy.py   [ ...]")
    print("Files are copied into the root of the image.")
    print("Directories are copied recursively")
看來就是一個很方便直接 copy file 到image file 的 tool,不用 loop mount ..

======== 參考reserved sector = 1
所以就是..
device/brobwind/rpi3$ git diff
diff --git a/build/tasks/rpiboot.mk b/build/tasks/rpiboot.mk
index 10d3d74..871633f 100644
--- a/build/tasks/rpiboot.mk
+++ b/build/tasks/rpiboot.mk
@@ -73,7 +73,7 @@ unique_rpiboot_copy_files_destinations_dirs := $(filter-out .,$(patsubst %/,%,$(
 define build-rpibootimage-target
        mkdir -p $(dir $(1))
        dd if=/dev/zero of=$(1) bs=$$((1024*1024)) count=$(2)
-       mkfs.fat -n "rpiboot" $(1)
+       mkfs.fat -a -R1 -n "rpiboot" $(1)
        for item in $(ALL_INSTALLED_RPIBOOT_FILES); do \
                if [ "`dirname $${item}`" = "$(RPIBOOT_OUT_ROOT)" ] ; then \
                        $(FAT16COPY) $(1) $${item} ; \

2019/11/14

put driver firmware bin file into kernel

看看比較簡單(?) 的 firmware driver : WHITEHEAT
他是 usb serial converter,所以要到 usb client driver 上找,usb ez...
enable 起來後,make log:
make -f /home/charles-chang/mt2712robot/kernel-4.9/scripts/Makefile.build obj=firmware
  FWNAME="whiteheat_loader.fw"; FWSTR="whiteheat_loader_fw"; ASM_WORD=.quad; ASM_ALIGN=3; PROGBITS=@progbits;
 echo "/* Generated by firmware/Makefile */"               > firmware/whiteheat_loader.fw.gen.S;
 echo "    .section .rodata"                               >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .p2align ${ASM_ALIGN}"                  >>firmware/whiteheat_loader.fw.gen.S;
 echo "_fw_${FWSTR}_bin:"                          >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .incbin \"firmware/whiteheat_loader.fw\""                               >>firmware/whiteheat_loader.fw.gen.S;
 echo "_fw_end:"                                   >>firmware/whiteheat_loader.fw.gen.S;
 echo "   .section .rodata.str,\"aMS\",${PROGBITS},1"      >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .p2align ${ASM_ALIGN}"                  >>firmware/whiteheat_loader.fw.gen.S;
 echo "_fw_${FWSTR}_name:"                         >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .string \"$FWNAME\""                    >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .section .builtin_fw,\"a\",${PROGBITS}" >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .p2align ${ASM_ALIGN}"                  >>firmware/whiteheat_loader.fw.gen.S;
 echo "    ${ASM_WORD} _fw_${FWSTR}_name"          >>firmware/whiteheat_loader.fw.gen.S;
 echo "    ${ASM_WORD} _fw_${FWSTR}_bin"           >>firmware/whiteheat_loader.fw.gen.S;
 echo "    ${ASM_WORD} _fw_end - _fw_${FWSTR}_bin" >>firmware/whiteheat_loader.fw.gen.S;
  gcc -Wp,-MD,firmware/.ihex2fw.d -Ifirmware -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89
 -o firmware/ihex2fw /home/charles-chang/mt2712robot/kernel-4.9/firmware/ihex2fw.c
這會把 firmwae 下的 *.HEX 檔轉成 binary 檔,ro 產生一個 section 放置。
這樣就會變成kernel 的一部分,所以用 section name (label) 就可以 access到這個 bin 的位置。

ㄎ 但是這是很不好的方法。

這種方法另一個方式是用 menuconfig, 不用改 kernel source..
ref: Usage and Mechanism of kernel function "request firmware()"

menuconfig -- device driver -- generic .... -- firmware....
有一項就是firmare,因為 firmware dir 的 default 是 firmware,所以直接把 bin 檔放到 firmware/ 下就可以,
然後把 bin 檔的檔名寫在 剛剛 menuconfig 的 FIRMWARE 選項中。

這樣就不用改 kernel Makefile,也不用把 bin 檔轉成hex

2019/11/8

Nvidia DIGITS.

因為nvidia/caffe 關聯到 DIGITS. 所以就用用看..
文件 用的是 nvidia 舊版 docker,
新版的試試看:
~$ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility -itd -p 5000:5000 -v ~/dockerfolder:/dockerfolder nvidia/digits              
然後browser 開啟 ip:5000,果然出現了...
新的 image 好像要從 NGC 拿,有點麻煩...

dataset 要自己 download,然後在 web 界面指定 dataset folder,其他都可以在web page 上完成。
參考這一篇 ,或是 官方,有比較清楚 step by step 的操作。

因為在 client PC 上的browser 要 upload image,所以 run image 時還是指定 volume mapping 好了..

官方有一些 trainning 的 example
也有semantic segmentation 的 example

照著 create VOC dataset 時,卻一直都是 0
export log 來看。一開始有..
libdc1394 error: Failed to initialize libdc1394
2019-11-08 09:03:39 [INFO ] Created features db for stage train_db in train_db/features
2019-11-08 09:03:42 [INFO ] Created labels db for stage train_db in train_db/labels
2019-11-08 09:03:42 [INFO ] Processed 1/46
2019-11-08 09:03:42 [INFO ] Processed 2/46
2019-11-08 09:03:42 [INFO ] Processed 3/46
然後找到..這一篇
# ln /dev/null /dev/raw1394
log 就沒有出現 error了,但是 頁面顯示的還是 size 0.
切換到 datasets tag,點 VOC 開啟頁面,就不是 0 了,follow intruction export db,可以看到image 了...

接著 run script 產生 pretrained parameter (caffemodel).
customize network (剪下-貼上)
train 時出現 error:
ERROR: Cannot copy param 0 weights from layer 'fc6'; 
shape mismatch. Source param shape is 1 1 4096 9216 (37748736); 
target param shape is 4096 256 6 6 (37748736). 
To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.
google 的答案,指向 stackoverflow:
Rename the layer from "loss1/classifier" to "loss1/classifier_retrain".

When fine-tuning a model, here's what Caffe does:

# pseudo-code
for layer in new_model:
  if layer.name in old_model:
    new_model.layer.weights = old_model.layer.weights

You're getting an error because the weights for "loss1/classifier" were for a 1000-class classification problem (1000x1024), 
and you're trying to copy them into a layer for a 6-class classification problem (6x1024). 
When you rename the layer, Caffe doesn't try to copy the weights for that layer and you get randomly initialized weights - which is what you want.
結果完全不是這麼回事...
純粹是因為在把alexnet pretrained model 轉為 fc-alexnet pretrained model 時,要 run net_surgery.sh
script 會去 download bvlc_alexnet.XX,然後要參考 fcn_alexnet.deploy.prototxt
這個 fcn_alexnet.deploy.prototxt 在 DIGITS 的 example/sematic-segmentation 中,所以要 clone 下來,copy 到 run net_surgery.sh 的目錄。
這樣才會正確產生 fcn_alexnet.caffemodel


在 github 的docker 版本只有到 16 版,所以example 中的 GAN model 沒有 install
參考nvidia deep learning digits documentation 中的 repo,有道 19.0 版。

依照說明..
docker pull nvcr.io/nvidia/digits:19.10-tensorflow
然後follow 上面的 啟動..
~$ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility -itd -p 5000:5000 -v ~/dockerfolder:/dockerfolder nvcr.io/nvidia/digits:19.10-tensorflow
然後開啟 browser,port 5000,進入 digits web 話面,右邊 image new 就有 GAN 可以選了

2019/11/7

ffmpeg cuda -- scale

參考ffmpeg HWAccelintro 有關使用 scale_cuda 做 resize..
NVDEC/CUVID

NVDEC offers decoders for H.264, HEVC, MJPEG, MPEG-1/2/4, VP8/VP9, VC-1. Codec support varies by hardware (see the ​GPU compatibility table).

Note that FFmpeg offers both NVDEC and CUVID hwaccels. They differ in how frames are decoded and forwarded in memory.

The full set of codecs being available only on Pascal hardware, which adds VP9 and 10 bit support. The note about missing ffnvcodec from NVENC applies for NVDEC as well.

Sample decode using NVDEC:
ffmpeg -hwaccel nvdec input output
Sample decode using CUVID:
./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -i input output
Full hardware transcode with CUVID and NVENC:
ffmpeg -hwaccel cuvid -c:v h264_cuvid -i input -c:v h264_nvenc -preset slow output
If ffmpeg was compiled with support for libnpp, it can be used to insert a GPU based scaler into the chain:
ffmpeg -hwaccel_device 0 -hwaccel cuvid -i input -vf scale_npp=-1:720 -c:v h264_nvenc -preset slow output.mkv
The -hwaccel_device option can be used to specify the GPU to be used by the hwaccel in ffmpeg.

實際上..
ffmpeg -hwaccel cuvid -c:v h264_cuvid -i mvideo.mp4 -vf scale_npp=2048:-1 -c:v h264_nvenc t1.mp4
前面用 -c:v h264_cuvid,後面 output filename 之前要加上 -c:v h264_nvenc 才行。
不然會有[FFmpeg-user] Error:Impossible to convert between the formats supported by the filter.. 的 Error。
使用 cuda 的 converting rate 是 2.5X.
但是不用 cuda,用cpu 的話,會是 0.8X

ffmpeg with cuda support -- nvenc API version not match

依照前的方法, build ffmpeg w cuda support,結果測試 transcoding 出現 error:
Driver does not support the required nvenc API version. Required: 9.1 Found: 8.1
參考這一篇 的內容,是 header 的關係..

到 nv-codec-header 看一下 tag 和 branch。
checkout sdk/8.1。
make uninstall 移除上次的.
make && sudo make install

然後重新 configure, make ffmpeg,
之後就 OK 了。


有些古老的文章提到 nvresize,這個要上 nvidia patch 的 hw 加速 scale 的功能。後來都找不到了。
這一篇 好像說明了,由 ...

1. Hardware-accelerated encoders: In the case of NVIDIA, NVENC is supported and implemented via the h264_nvenc and the hevc_nvenc wrappers. See this answer on how to tune them, and any limitations you may run into depending on the generation of hardware you're on.

2. Hardware-accelerated filters: Filters that perform duties such as scaling and post-processing (deinterlacing, etc) are available in FFmpeg, and some implementations are hardware-accelerated. For NVIDIA, the following filters can take advantage of hardware-acceleration:

(a). scale_cuda: This is a scaling filter analogous to the generic scale filter, implemented in CUDA. It's dependency is the ffnvcodec project, headers needed to also enable the NVENC-based encoders. When the ffnvcodec headers are present, the respective filters dependent on it (scale_cuda and yadif_cuda) will be automatically enabled. In production, it may be wise to deprecate this filter in favor of scale_npp as it has a very limited set of options.

(b). scale_npp: This is a scaling filter implemented in NVIDIA's Performance Primitives. It's primary dependency is the CUDA SDK, and it must be explicitly enabled by passing --enable-libnpp, --enable-cuda-nvcc and --enable-nonfree flags to ./configure at compile time when building FFmpeg from source. Use this filter in place of scale_cuda wherever possible.

(c). yadif_cuda: This is a deinterlacer, implemented in CUDA. It's dependency, as stated above, is the ffnvcodec package of headers.

(d). All OpenCL-based filters: All NVENC-capable GPUs supported by both the mainline NVIDIA driver and the CUDA SDK implement OpenCL support. I started this section with this clarification because there's news in the wind that NVIDIA will be deprecating mobile Kepler GPUs in their mainline driver, relegating them to Legacy support status. For this reason, if you're on such a platform, take this into consideration.

To enable these filters, pass --enable-opencl to FFmpeg's ./configure script at build time. Note that this requires the OpenCL headers to be present on your system, and can be safely satisfied by your package manager on whatever Linux distribution you're on. On other operating systems, your mileage may vary.

To see all OpenCL-based filters, run:
ffmpeg -h filters | grep opencl
A few notable examples being unsharp_opencl,avgblur_opencl, etc. See this wiki section for more options.
A note pertaining to performance with OpenCL filters: Please take into account any overheads that mechanisms introduced by filter chains such as hwupload and hwdownload may introduce into your pipeline, as uploading textures to and from system memory and the accelerator in question will affect performance, and so will format conversion operations (via the format filter) where needed/required. In this case, it may be beneficial to take advantage of the hwmap filter, and deriving contexts where applicable. For instance, VAAPI has a mechanism that allows for OpenCL device derivation and reverse mapping via hwmap, if the cl_intel_va_api_media_sharing OpenCL extension is present. This is typically provided by the Beignet ICD, and is absent in others, such as the newer Neo OpenCL driver.

3. Hardware-accelerated decoders (and their associated wrappers): Depending on your input source, and the capabilities of your NVIDIA GPU, based on generation, you may also tap into hardware accelerations based on either CUVID or NVDEC. These methods differ in how they handle textures in-flight on the accelerator, and it is wise to evaluate other factors, such as VRAM utilization, when they are in use. Typically, you can take advantage of the CUVID-based hwaccels for operations such as deinterlacing, if so desired. See their usage via:
ffmpeg -h decoder=h264_cuvid
ffmpeg -h decoder=hevc_cuvid
ffmpeg -h decoder=mpeg2_cuvid
However, beware that handling MBAFF encoded content with these decoders, where double deinterlacing is required, is not advisable as NVIDIA has not yet implemented MBAFF support in the backend. Take a look at this thread for more on the same.

In closing: It is wise to evaluate where and when hardware accelerated offloading (filtering, encoding and decoding) offers an advantage or an acceptable trade-off (in quality, feature support and reliability) in your pipeline prior to deployment in production. This is a vendor-neutral approach when deciding what and when to offload parts of your pipeline, and the same applies to NVIDIA's solutions.

For more information, refer to the hardware acceleration entry in FFmpeg's wiki.

Samples demonstrating the use of hardware-accelerated filtering, encoding and decoding based on the notes above:

1. Demonstrate the use of 1:N encoding with NVENC:

The following assumption is made: The test-bed only has one NVENC-capable GPU present, a simple GTX 1070. For this reason I'm limited to two simultaneous NVENC sessions, and that is taken into account with the snippets below. Be warned that cases needing to utilize multiple NVENC-capable GPUs will need the command line(s) modified as appropriate.

My sample files are in ~/Desktop/src

I'll be working with a sample file as shown below:
ffprobe -i deint-testfile.mkv -show_format -hide_banner -show_streams

2019/11/4

cuda docker images .. again , for develop nvcc command.

上次做的,發現nvidia/cuda10.1-base,進入後,雖然 nvidia-smi 是 OK 的,但是沒有 nvcc command。
自己安裝 Toolkit 後,build ffmpeg OK,run 起來 libavcodec.so load fail,說 nvidia driver version 不對 (其實是對的)。

cuda container tags list 中,有列出所有 image 的內容。
每個 link 可以看到 Dockerfile。
猜是 nvidia/cuda10.1-devel 這格才有 nvcc。
測試一下..
docker run -gpus all -it nvidia/cuda:10.1-devel bash
..
:/# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
到 /usr/local 下也有看到 cuda-10.1 目錄。

結果一樣,找不到 libnvidia-encode.so

在 host 找,是在 /usr/lib/:
:/usr/lib$ sudo find . -type f -name 'libnvidia-en*'
./i386-linux-gnu/libnvidia-encode.so.435.21
./x86_64-linux-gnu/libnvidia-encode.so.435.21

可能要參考這一篇,加上 CAPIBILITIES option
-- 沒用,那是舊版nvidia-docker。

這一個 dockerfile 大概可以試試看..

最後,在 nvidia Docker 自己的說明 : Usage
For a Dockerfile using the NVIDIA Video Codec SDK, you should use:

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,video,utility
所以,..遵照說明,把 cuda10.1-base 的Dockerfile download 下來。
在 NVIDIA_DRIVER_CAPABILITIES 加上 video
放在 folder 中,在 folder 中下..
docker build -t cudaffmpeg .
用 docker images 看看 cudaffmpeg 有沒有 create 出來..
結果 libnvidia-encode.so 是有了,但是沒有 nvcc...

應該要參考devel 的 Dockerfile..
所以合併兩個 Dockerfile..
.. 看這幾個Tag的Dockerfile 關係(FROM):
base -- runtime -- devel
所以直接拿 devel 來就可以。再用 -e 來變更 dockfile 的 ENV NVIDIA_DRIVER_CAPABILITIES:
docker run -it --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility nvidia/cuda:10.1-devel bash
測試 nvcc, nvidia-smi. libnvidia-encode.so 都在。

參考前面兩篇,用這個docker image 啟動後, build ffmpeg,使用 nvidia hw 加速 OK.

2019/11/1

worklog -- prepare docker container for ffmpeg cuda

用 docker 測試 ffmpeg cuda ..
docker
  • 以daemon 方式執行,使用 ssh 登入
  • ssh login uid/gid 要與 host 端一致
  • 使用 host folder
  • 要 support cuda
要 support cuda,所以用 nvidia 的 docker image。
而且啟動的時候要加上 "--gpu all"
docker run -idt --gpus all -v ~/dockerfolder:/dockerfolder -p 8022:22 nvidia/cuda:10.1-base
docker exec -it container-name bash
進入後...
apt-get update
apt-get install openssh-server vim sudo
/etc/init.d/ssh start
addgroup --gid 1001 myname
adduser --uid 1001 --gid 1001 myname
vi /etc/group  -- add myname to sudo and root group
這樣,已經可以 ssh -p 8022 hostname 登入了..

安裝需要的 compiler 和 tool
sudo apt-get install build-essential pkgconfig

要 install cuda sdk (Toolkit)
chmod a+x cuda_10.1.243_418.87.00_linux.run
sudo ./cuda_10.1.243_418.87.00_linux.run
...
然後依照說明,把 bin 和 so path 加入..
/etc/profil.d/cuda101.sh
export PATH=$PATH:/usr/local/cuda-10.1/bin
/etc/ld.so.conf.d/cuda-10.1.conf
/usr/local/cuda-10.1/target/x86_64-linux/lib
/usr/local/cuda-10.1/lib64
改完 ld.so.conf.d, sudo run 一次 ldconfig 來 update ld cache.
至於 path. 如果不重新開機,就要自己 source 一次。

follow nvidia ffmpeg,因為 ffmpeg 已經把 nvidia 要的 header 分離開來,所以要到 nvidia 下載:
git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
cd nv-codec-headers
make
sudo make install

開始 download and build ffmpeg...
git clone https://git.ffmpeg.org/ffmpeg.git
cd ffmpeg
git checkout n4.2.1
git checkout -b 4.2.1

./configure --prefix=/dockerfolder/cudaffmpeg --enable-cuda-nvcc --enable-cuvid --enable-nvenc
 --enable-nonfree --enable-libnpp --extra-cflags=-I/usr/local/cuda/include
 --extra-ldflags=-L/usr/local/cuda/lib64
make -j10
make install
會安裝到 /dockerfolder/cudaffmpeg 下,
所以還要增加 share library path
export LD_LIBRARY_PATH=/dockerfolder/cudaffmpeg/lib
然後就可以測試...
:/dockerfolder/cudaffmpeg/bin$ ./ffmpeg -version
ffmpeg version N-95607-gb414cff630 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
configuration: --prefix=/dockerfolder/cudaffmpeg --enable-cuda-nvcc --enable-cuvid --enable-nvenc
 --enable-nonfree --enable-libnpp --extra-cflags=-I/usr/local/cuda/include
 --extra-ldflags=-L/usr/local/cuda/lib64
libavutil      56. 35.101 / 56. 35.101
libavcodec     58. 60.100 / 58. 60.100
libavformat    58. 33.100 / 58. 33.100
libavdevice    58.  9.100 / 58.  9.100
libavfilter     7. 66.100 /  7. 66.100
libswscale      5.  6.100 /  5.  6.100
libswresample   3.  6.100 /  3.  6.100

測試..20sec 的 mp4 轉 h264
time ~/cudaffmpeg/bin/ffmpeg -i testvideo.mp4 -an -vcodec h264_nvenc  testvideo.h264
...
real    0m4.157s
user    0m14.736s
sys     0m0.214s

time ~/cudaffmpeg/bin/ffmpeg -i testvideo.mp4 -an -vcodec libx264  testvideo0.h264
...
real    0m24.757s
user    2m19.679s
sys     0m2.956s
好像快了 10 倍...

一些其他的 ffmpeg, 一般的options..
scale:
ffmpeg -i testvideo.h264 -vf scale=320:240 -vcodec h264_nvenc test320.h264

這個docker container 執行 ffmpeg hvenc 失敗,找不到 libnvidia-encode.so。要用這一篇 的方法加上 capability 參數,enable video 才行。

2019/10/31

linux dtsi, gpio

因為 dts 裡,touch ic 的 device tree 有一個 property..
              irq-gpios = <&pio 2 0x2800>;

include/linux/of_gpio.h:
static inline int of_get_named_gpio(struct device_node *np,
                                   const char *propname, int index)
{
        return of_get_named_gpio_flags(np, propname, index, NULL);
}
drivers/gpio/gpiolib-of.c
int of_get_named_gpio_flags(struct device_node *np, const char *list_name,
                            int index, enum of_gpio_flags *flags)
{
        struct gpio_desc *desc;

        desc = of_get_named_gpiod_flags(np, list_name, index, flags);

        if (IS_ERR(desc))
                return PTR_ERR(desc);
        else
                return desc_to_gpio(desc);
}
...
..
struct gpio_desc *of_get_named_gpiod_flags(struct device_node *np,
                     const char *propname, int index, enum of_gpio_flags *flags)
{
        struct of_phandle_args gpiospec;
        struct gpio_chip *chip;
        struct gpio_desc *desc;
        int ret;

        ret = of_parse_phandle_with_args(np, propname, "#gpio-cells", index,
                                         &gpiospec);
        if (ret) {
                pr_debug("%s: can't parse '%s' property of node '%s[%d]'\n",
                        __func__, propname, np->full_name, index);
                return ERR_PTR(ret);
        }

        chip = of_find_gpiochip_by_xlate(&gpiospec);
        if (!chip) {
                desc = ERR_PTR(-EPROBE_DEFER);
                goto out;
        }

        desc = of_xlate_and_get_gpiod_flags(chip, &gpiospec, flags);
        if (IS_ERR(desc))
                goto out;

        pr_debug("%s: parsed '%s' property of node '%s[%d]' - status (%d)\n",
                 __func__, propname, np->full_name, index,
                 PTR_ERR_OR_ZERO(desc));

out:
        of_node_put(gpiospec.np);

        return desc;
}
後面哪個..implement 在 kernel/irq/irqdomain.c
int irq_domain_xlate_twocell(struct irq_domain *d, struct device_node *ctrlr,
                        const u32 *intspec, unsigned int intsize,
                        irq_hw_number_t *out_hwirq, unsigned int *out_type)
{
        if (WARN_ON(intsize < 2))
                return -EINVAL;
        *out_hwirq = intspec[0];
        *out_type = intspec[1] & IRQ_TYPE_SENSE_MASK;
        return 0;
}
EXPORT_SYMBOL_GPL(irq_domain_xlate_twocell);

2019/10/28

ubuntu 18.04 - disable gui on boot and login page

login 改 console:
sudo systemctl set-default multi-user.target

boot 改 console:
修改 /etc/default/grub,然後 run update-grub 讓他更新到 /boot/grub,,
#GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
...
GRUB_TERMINAL=console

2019/10/25

IBM ultranavi, Windows 7 touch/trackpoint 失效

很倒楣的是這個keyboard (2007 年買的),沒有 Windows 7 的 driver。
所以 touchpad/trackpoint 常常會用到 synaptic 的 driver,可是又不相容。

一旦發生 touch/trackpoint 失效的時候,就只有強制把 synaptic driver 移除
-- rename C:/Windows/system32/drivers/syntp.sys

解除安裝 synaptic uid 裝置。x2

然後重新開機。

2019/10/21

conda, caffe

conda create -n caffe2.7
conda activate caffe2.7

然後安裝下面 package..
python=2.7
numpy
matplotlib
scikit-image
pyyaml
protobuf
額外安裝 jupyter

測試結果裝了 protobuf 的話,會是 libprotobuf.so.2,make OK,但是 執行 .build_release/tools/caffe 會出現找不到 libprotobuf.so.2 的 Error
用 export LD_LIBRARY_PATH=... 之後就可以 run
但是不安裝,用系統的 libprotobuf.so.1 就可以。

在 python 中 import caffe 時,如果沒有 conda install protobuf,不會用系統的 libprotobuf.so.1。
所以還是要安裝。

安裝完後,build runtest OK, 然後修改 Make.config.example, rename 成 Make.config
@@ -2,7 +2,7 @@
 # Contributions simplifying and improving our build system are welcome!

 # cuDNN acceleration switch (uncomment to build with cuDNN).
-# USE_CUDNN := 1
+USE_CUDNN := 1

 # CPU-only switch (uncomment to build without GPU support).
 # CPU_ONLY := 1
@@ -20,7 +20,7 @@
 # ALLOW_LMDB_NOLOCK := 1

 # Uncomment if you're using OpenCV 3
-# OPENCV_VERSION := 3
+OPENCV_VERSION := 3

 # To customize your choice of compiler, uncomment and set the following.
 # N.B. the default for Linux is g++ and the default for OSX is clang++
@@ -36,9 +36,7 @@
 # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility.
 # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
 # For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
-CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
-               -gencode arch=compute_20,code=sm_21 \
-               -gencode arch=compute_30,code=sm_30 \
+CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
                -gencode arch=compute_35,code=sm_35 \
                -gencode arch=compute_50,code=sm_50 \
                -gencode arch=compute_52,code=sm_52 \
@@ -68,34 +66,34 @@

 # NOTE: this is required only if you will compile the python interface.
 # We need to be able to find Python.h and numpy/arrayobject.h.
-PYTHON_INCLUDE := /usr/include/python2.7 \
+#PYTHON_INCLUDE := /usr/include/python2.7 \
                /usr/lib/python2.7/dist-packages/numpy/core/include
                                                                                                                                 16,1          
 # Anaconda Python distribution is quite popular. Include path:
 # Verify anaconda location, sometimes it's in root.
-# ANACONDA_HOME := $(HOME)/anaconda
-# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
-               # $(ANACONDA_HOME)/include/python2.7 \
-               # $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include
+ANACONDA_HOME := $(HOME)/miniconda3/envs/caffe2.7
+PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
+                $(ANACONDA_HOME)/include/python2.7 \
+                $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include

 # Uncomment to use Python 3 (default is Python 2)
-# PYTHON_LIBRARIES := boost_python3 python3.5m
-# PYTHON_INCLUDE := /usr/include/python3.5m \
-#                 /usr/lib/python3.5/dist-packages/numpy/core/include
+#PYTHON_LIBRARIES := boost_python3 python3.6m
+#PYTHON_INCLUDE := /usr/include/python3.6m \
+                 /usr/lib/python3.6/dist-packages/numpy/core/include

 # We need to be able to find libpythonX.X.so or .dylib.
-PYTHON_LIB := /usr/lib
-# PYTHON_LIB := $(ANACONDA_HOME)/lib
+#PYTHON_LIB := /usr/lib
+PYTHON_LIB := $(ANACONDA_HOME)/lib

 # Homebrew installs numpy in a non standard path (keg only)
 # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
 # PYTHON_LIB += $(shell brew --prefix numpy)/lib

 # Uncomment to support layers written in Python (will link against Python libs)
-# WITH_PYTHON_LAYER := 1
+WITH_PYTHON_LAYER := 1

 # Whatever else you find you need goes here.
-INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
-LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
+INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
+LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial

 # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
 # INCLUDE_DIRS += $(shell brew --prefix)/include
@@ -107,7 +105,7 @@

 # Uncomment to use `pkg-config` to specify OpenCV library paths.
 # (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
-# USE_PKG_CONFIG := 1
+USE_PKG_CONFIG := 1

 # N.B. both build and distribute dirs are cleared on `make clean`
 BUILD_DIR := build
..原來 的 example 就有考慮到 conda environment 的環境,所以要修改 ANACONDA_PATH (雖然使用 miniconda)

miniconda -- ubuntu 18.04 for caffe

Miniconda3 will now be installed into this location:
/home/charles-chang/miniconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/home/charles-chang/miniconda3] >>> 
PREFIX=/home/charles-chang/miniconda3
Unpacking payload ...
Collecting package metadata (current_repodata.json): done                                                                
Solving environment: done

## Package Plan ##

  environment location: /home/charles-chang/miniconda3

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - asn1crypto==1.0.1=py37_0
    - ca-certificates==2019.8.28=0
    - certifi==2019.9.11=py37_0
    - cffi==1.12.3=py37h2e261b9_0
    - chardet==3.0.4=py37_1003
    - conda-package-handling==1.6.0=py37h7b6447c_0
    - conda==4.7.12=py37_0
    - cryptography==2.7=py37h1ba5d50_0
    - idna==2.8=py37_0
    - libedit==3.1.20181209=hc058e9b_0
    - libffi==3.2.1=hd88cf55_4
    - libgcc-ng==9.1.0=hdf63c60_0
    - libstdcxx-ng==9.1.0=hdf63c60_0
    - ncurses==6.1=he6710b0_1
    - openssl==1.1.1d=h7b6447c_2
    - pip==19.2.3=py37_0
    - pycosat==0.6.3=py37h14c3975_0
    - pycparser==2.19=py37_0
    - pyopenssl==19.0.0=py37_0
    - pysocks==1.7.1=py37_0
    - python==3.7.4=h265db76_1
    - readline==7.0=h7b6447c_5
    - requests==2.22.0=py37_0
    - ruamel_yaml==0.15.46=py37h14c3975_0
    - setuptools==41.4.0=py37_0
    - six==1.12.0=py37_0
    - sqlite==3.30.0=h7b6447c_0
    - tk==8.6.8=hbc83047_0
    - tqdm==4.36.1=py_0
    - urllib3==1.24.2=py37_0
    - wheel==0.33.6=py37_0
    - xz==5.2.4=h14c3975_4
    - yaml==0.1.7=had09818_2
    - zlib==1.2.11=h7b6447c_3


The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  asn1crypto         pkgs/main/linux-64::asn1crypto-1.0.1-py37_0
  ca-certificates    pkgs/main/linux-64::ca-certificates-2019.8.28-0
  certifi            pkgs/main/linux-64::certifi-2019.9.11-py37_0
  cffi               pkgs/main/linux-64::cffi-1.12.3-py37h2e261b9_0
  chardet            pkgs/main/linux-64::chardet-3.0.4-py37_1003
  conda              pkgs/main/linux-64::conda-4.7.12-py37_0
  conda-package-han~ pkgs/main/linux-64::conda-package-handling-1.6.0-py37h7b6447c_0
  cryptography       pkgs/main/linux-64::cryptography-2.7-py37h1ba5d50_0
  idna               pkgs/main/linux-64::idna-2.8-py37_0
  libedit            pkgs/main/linux-64::libedit-3.1.20181209-hc058e9b_0
  libffi             pkgs/main/linux-64::libffi-3.2.1-hd88cf55_4
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
  ncurses            pkgs/main/linux-64::ncurses-6.1-he6710b0_1
  openssl            pkgs/main/linux-64::openssl-1.1.1d-h7b6447c_2
  pip                pkgs/main/linux-64::pip-19.2.3-py37_0
  pycosat            pkgs/main/linux-64::pycosat-0.6.3-py37h14c3975_0
  pycparser          pkgs/main/linux-64::pycparser-2.19-py37_0
  pyopenssl          pkgs/main/linux-64::pyopenssl-19.0.0-py37_0
  pysocks            pkgs/main/linux-64::pysocks-1.7.1-py37_0
  python             pkgs/main/linux-64::python-3.7.4-h265db76_1
  readline           pkgs/main/linux-64::readline-7.0-h7b6447c_5
  requests           pkgs/main/linux-64::requests-2.22.0-py37_0
  ruamel_yaml        pkgs/main/linux-64::ruamel_yaml-0.15.46-py37h14c3975_0
  setuptools         pkgs/main/linux-64::setuptools-41.4.0-py37_0
  six                pkgs/main/linux-64::six-1.12.0-py37_0
  sqlite             pkgs/main/linux-64::sqlite-3.30.0-h7b6447c_0
  tk                 pkgs/main/linux-64::tk-8.6.8-hbc83047_0
  tqdm               pkgs/main/noarch::tqdm-4.36.1-py_0
  urllib3            pkgs/main/linux-64::urllib3-1.24.2-py37_0
  wheel              pkgs/main/linux-64::wheel-0.33.6-py37_0
  xz                 pkgs/main/linux-64::xz-5.2.4-h14c3975_4
  yaml               pkgs/main/linux-64::yaml-0.1.7-had09818_2
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3


Preparing transaction: done
Executing transaction: done
installation finished.
Do you wish the installer to initialize Miniconda3
by running conda init? [yes|no]
[no] >>> yes
no change     /home/charles-chang/miniconda3/condabin/conda
no change     /home/charles-chang/miniconda3/bin/conda
no change     /home/charles-chang/miniconda3/bin/conda-env
no change     /home/charles-chang/miniconda3/bin/activate
no change     /home/charles-chang/miniconda3/bin/deactivate
no change     /home/charles-chang/miniconda3/etc/profile.d/conda.sh
no change     /home/charles-chang/miniconda3/etc/fish/conf.d/conda.fish
no change     /home/charles-chang/miniconda3/shell/condabin/Conda.psm1
no change     /home/charles-chang/miniconda3/shell/condabin/conda-hook.ps1
no change     /home/charles-chang/miniconda3/lib/python3.7/site-packages/xontrib/conda.xsh
no change     /home/charles-chang/miniconda3/etc/profile.d/conda.csh
modified      /home/charles-chang/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

If you'd prefer that conda's base environment not be activated on startup, 
   set the auto_activate_base parameter to false: 

conda config --set auto_activate_base false

Thank you for installing Miniconda3!
然後一樣,把 ~/.bashrc 最後一段 copy 出來 ...condaenv.sh:
#!/bin/bash
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/charles-chang/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/charles-chang/miniconda3/etc/profile.d/conda.sh" ]; then
        . "/home/charles-chang/miniconda3/etc/profile.d/conda.sh"
    else
        export PATH="/home/charles-chang/miniconda3/bin:$PATH"
    fi
fi
unset __conda_setup
果然...雖然 ubuntu 18.04 的 python3 是 python3.6。但是 conda 加裝了 python3.7...
所以現在的系統...
charles-chang@zoeymkII:~$ python --version
Python 2.7.15+
charles-chang@zoeymkII:~$ python3 --version
Python 3.6.8
charles-chang@zoeymkII:~$ source condaenv.sh 
(base) charles-chang@zoeymkII:~$ python --version
Python 3.7.4
(base) charles-chang@zoeymkII:~$
有三種版本....

conda 自動安裝的 python3.7 不包含 dev,所以 /usr/include 下沒有 python3.7, python3.7m
要的話,需要再 apt-get install libpython3.7-dev


OK, 這樣,source condaenv.sh 後,就可以使用 conda 了...
先 create new env and activate it..
conda create --name caffe
conda activate caffe
安裝 python3.6
conda install python=3.6
安裝 jupyter
conda install jupyter

2019/10/17

[ 5416.045468] usb 4-3.4: USB disconnect, device number 3
[ 7632.747488] usb 4-3.4: new SuperSpeed USB device number 4 using xhci_hcd
[ 7632.768320] usb 4-3.4: New USB device found, idVendor=0bda, idProduct=8153
[ 7632.768325] usb 4-3.4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[ 7632.768327] usb 4-3.4: Product: USB 10/100/1000 LAN
[ 7632.768330] usb 4-3.4: Manufacturer: Realtek
[ 7632.768333] usb 4-3.4: SerialNumber: 000001
[ 7632.851707] usb 4-3.4: reset SuperSpeed USB device number 4 using xhci_hcd
[ 7632.928294] r8152 4-3.4:1.0 eth2: v1.08.9
[ 7632.971651] r8152 4-3.4:1.0 enx00e04c680117: renamed from eth2
[ 7633.007431] IPv6: ADDRCONF(NETDEV_UP): enx00e04c680117: link is not ready
[ 7633.037604] IPv6: ADDRCONF(NETDEV_UP): enx00e04c680117: link is not ready
[ 7970.840340] r8152 4-3.4:1.0 enx00e04c680117: carrier on
[ 7970.840548] IPv6: ADDRCONF(NETDEV_CHANGE): enx00e04c680117: link becomes ready

2019/10/1

install phabricator on ubuntu 18.04

遵照 這一篇 在 ubuntu18.04 上安裝 phabricator。

  • install nginx
  • install mariadb
  • add ppa and install php7.2
  • clone phabricator source
大致都 OK。
除了...

step 3 裝完 php-fpm 後那個測試 phpinfo.php ,比需要修改 site-available/default,把 php section unmark。
還要 啟動 php7.2-fpm (參考這裡)

如果 測試的機器沒有被 dns support,要用 raw ip 操作, site-available/phabricator 的 'server_name' 要 comment 掉。

還有,啟動後,頁面出現 mysql 沒反應。依照說明進入操作一下,後後再 restart mariadb.service 後才OK。


最後,要自動啟動 daemon : phd 的話..
ref:phabricator daemon phd systemd services
下面是我用的 /lib/systemd/system/phabricator-phd.service
[Unit]
Description=phabricator-phd
After=syslog.target network.target
Before=nginx.service
  
[Service]
User=root
Group=root
Type=oneshot
Environment="PATH=/sbin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin"
ExecStart=/opt/phabricator/bin/phd start
ExecStop=/opt/phabricator/bin/phd stop
RemainAfterExit=yes
  
[Install]
WantedBy=multi-user.target
sudo systemctl start phabricator-phd.service
sudo systemctl stop phabrircator-phd.service
用 ps 看有沒有 start/stop。
OK 後用
sudo systemctl enable phabricator-phd.service
系統就會在 /etc/systemd/system/multi-user.target.wants 下建一個 link.
這樣,機器開雞的時候就會自動執行,


至於 fpm 用 TCP (127.0.0.1:9000) 還是 unix socket (/var/run/php5-fpm.sock).
之間的優劣可以看這一篇
大概就是.. TCP socket 要轉 client server 比較容易。
unix socket 效能比較好。


更新:

Configuring Phabricator 這一篇有完整的設定,不太一樣..

2019/9/25

jupyter notebook : ImportError: No module named 'matplotlib'

在 conda 環境中啟動 jupyter notebook。
有用 conda install matplotlib
在 conda terminal/shell 中啟動 python,import matplotlib 是 OK 的。
但是在 conda 啟動的 jupyter notebook 中 import 卻失敗。

從 jupyter notebook Error message 發現 python 版本不對。
用 which jupyter 看,是在 .local/bin/jupyter。

所以是用到系統的 jupyter,不是 conda env 的 jupyter。
因為 conda env 的python 版本跟系統不一樣,所以發生錯誤。
-- 其實主要原因應該是conda env沒有 install jupyter。

所以在 conda 環境中 install jupyter 後OK

conda

miniconda, anaconda 好像命令一樣。

conda 大概是用來解決python 版本和 package 環境的問題。
有點像 VENV。

創造一個新環境的 profile:
conda create -n mynewenv
然後看看目前的環境 profile 有哪些,還有目前是在那一個 profile 下工作.
conda info -e
切換(/啟動)環境
conda activate mynewenv
在這個環境下安裝的 package 只有在這格環境下看的的到。
install package..
 conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
離開這個環境
conda deactivate
然後,如果不需要,可以刪除這格環境,包含他所有安裝得package 和佔用的disk space
conda remove -n mynewenv --all


在 raspberry pi 3, armbian 64bit 下 download miniconda aar64 的 install script 安裝的話…會出現 illegal instruction。
在修好之前,要去miniforge 下載對應的版本。

2019/9/24

Dive to Deep Learning, -- install miniconda

follow instruction run
bash Miniconda3-latest-Linux-x86_64.sh
都回答 Yes後,他會download miniconda 到 local folder,...
Preparing transaction: done
Executing transaction: - WARNING conda.core.envs_manager:register_env(46): Unable to register environment. 
Path not writable or missing.
  environment location: /home/charles-chang/miniconda2
  registry file: /home/charles-chang/.conda/environments.txt
done
installation finished.
Do you wish the installer to initialize Miniconda2
by running conda init? [yes|no]
[no] >>> yes
no change     /home/charles/miniconda2/condabin/conda
no change     /home/charles/miniconda2/bin/conda
no change     /home/charles/miniconda2/bin/conda-env
no change     /home/charles/miniconda2/bin/activate
no change     /home/charles/miniconda2/bin/deactivate
no change     /home/charles/miniconda2/etc/profile.d/conda.sh
no change     /home/charles/miniconda2/etc/fish/conf.d/conda.fish
no change     /home/charles/miniconda2/shell/condabin/Conda.psm1
no change     /home/charles/miniconda2/shell/condabin/conda-hook.ps1
no change     /home/charles/miniconda2/lib/python2.7/site-packages/xontrib/conda.xsh
no change     /home/charles/miniconda2/etc/profile.d/conda.csh
modified      /home/charles/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

If you'd prefer that conda's base environment not be activated on startup, 
   set the auto_activate_base parameter to false: 

conda config --set auto_activate_base false

Thank you for installing Miniconda2!
然後修改 .bashrc:

# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/charles/miniconda2/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/charles/miniconda2/etc/profile.d/conda.sh" ]; then
        . "/home/charles/miniconda2/etc/profile.d/conda.sh"
    else
        export PATH="/home/charles/miniconda2/bin:$PATH"
    fi
fi
unset __conda_setup

為了避免麻煩。把這個 section copy 出來.. setupminicondaenv.bash

d2l-zh 的 environment.yml:
name: gluon
dependencies:
- python=3.6
- pip:
  - mxnet-cu100==1.5.0
  - d2lzh==0.8.11
  - jupyter==1.0.0
  - matplotlib==2.2.2
  - pandas==0.23.4
!!python 3.6!!

在安裝dl2-zh需要的package 和環境之前,要修改一下,使用 GPU版本的MXNET。
依照說明,用 nvidia-smi 看一下自己的 cuda 版本 (10.0)
修改 dl2-zh/environment.yml,上面已經是改好的內容 (mxnet-cu100)
conda env create -f d2l-zh/environment.yml
之後,安裝需要的 package,然後..
#
# To activate this environment, use
#
#     $ conda activate gluon
#
# To deactivate an active environment, use
#
#     $ conda deactivate

依照說明,conda activate gluon 就可以完成 環境設定。

開啟 jupyter notebook..
因為不是在 local,所以要開啟 remote access/loging.,
ref: public server,先..
jupyter notebook --generate-config
這樣會在 ~/.jupyter 下create 一個 jupyter_notebook_config.py
修改這個檔:
c.NotebookApp.allow_password_change = True
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.password_required = False
之後就可以啟動:
~$ jupyter notebook
...
...
    To access the notebook, open this file in a browser:
        file:///home/charles/.local/share/jupyter/runtime/nbserver-8678-open.html
    Or copy and paste one of these URLs:
        http://ey-wpc:8888/?token=77091332eb31bb666dc61e4d6d149460fe4e62757c44d157
..
在另一台電腦上開啟 browser 位址: http://ey-wpc:8888 (ey-wpc 是那台機器的 ip address)。
會出現要你輸入 token 或是 password 的畫面,把上面的77091332eb31bb666dc61e4d6d149460fe4e62757c44d157 copy 貼上就可以。

如果要設定 password 的話(這樣就不用 copy 那一堆key...)
$ jupyter notebook password
Enter password:  ****
Verify password: ****
[NotebookPasswordApp] Wrote hashed password to /Users/you/.jupyter/jupyter_notebook_config.json
就會產生 password hascode 檔。以後 jupyter 啟動就會用這個設定的 password (自己要記住)

jupyter 7 (之後?) 的版本,要...
jupyter server --generate-config
然後去修改 .jupyter/*config.py
先修改 ip='localhost' 到 ip='*'
這樣才能用 public ip access.
設定 password 要趕改成
jupyter server password
要設定 port 在啟動的時候加上 option:
jupyter notebook --port 8765
不要啟動 browser 也是一樣,加上 --no-browser

2019/9/19

Camera Interface : parallel, mipi and lvds

其實致這三個都是 多組 dataline 的,所以多算是 parallel 傳輸。
但是 camera interface 中的 parallel type,是跟 lcd panel 一樣,除 data0-7 之外,還有 vsync, hsync clk 等。
然後信號都不是 differential 的。

LVDS 則是把 sync, hsync 併入data line code 中,所以 data frame 中會有一些 start of fame 等 code.
LVDS 的信號都是 differential 的。

MIPI 信號跟 pin 腳跟 LVDS 一樣,但是電氣準位不一樣。
MIPI 傳輸得的內容,如了 影像資料外,還包函一些控制命令。

2019/9/17

以docker 作為package 設定安裝測試環境

就是test install, run 然後刪除這樣。
跟一般的機器一樣,但是如果是 網路服務,就要先把 port mapping 想好。
不然裝完才要加的話,還要 stop, commit 再重新 run (同時加上 -p)

基礎系統,以 ubuntu16.04 為例, docker run -it -p 8022:22 ubuntu:16.04 後..
  • apt-get update
  • apt-get install openssh-server && /etc/init.d/ssh start
  • adduser MyName
  • apt-get install sudo vim
  • vi /etc/group -- add MyName into root and sudo group
這樣就可以登入docker container 了..
ssh -p 8022 MyName@dockerhost

因為 docker 使用-v mount local folder 的話,folder 的 uid/gid 是不便的,
所以 add user 時,最好可以assign uid/gid。
addgroup --gid 1000 MyName
adduser --uid 1000 --gid 1000 myName

先把 docker run 在 daemon mode,需要操作時再用 docker exec 啟動 bash..
docker run -idt -v ~/dockerfolder:/dockerfolder -p 8022:22 nvidia/cuda:10.1-base
docker exec -it container-name bash

2019/9/16

一個無聊的問題 -- pip install imageio on python2.7

使用pip install imageio 時錯誤:
:~$ pip install imageio
Collecting imageio
  Downloading https://files.pythonhosted.org/packages/69/4a/0387d708394d5e25d95b1abe427c301614152d1bebea18d9b06fa7199704/imageio-2.5.0.tar.gz (3.3MB)
    100% |################################| 3.4MB 541kB/s 
Collecting numpy (from imageio)
  Downloading https://files.pythonhosted.org/packages/ac/36/325b27ef698684c38b1fe2e546e2e7ef9cecd7037bcdb35c87efec4356af/numpy-1.17.2.zip (6.5MB)
    100% |################################| 6.5MB 283kB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "", line 1, in 
      File "/tmp/pip-build-o0a1gw/numpy/setup.py", line 31, in 
        raise RuntimeError("Python version >= 3.5 required.")
    RuntimeError: Python version >= 3.5 required.
發現,imageio2.5 depends on numpy1.17.
numpy1.17 只support python3
所以在 python2.7 上 安裝失敗。

所以用 pip install numpy==1.16.0 強制版本安裝 OK
但是安裝imageio 時一樣,需要 numpy1.17。

發現...
跟 pip 的安裝有關。

follow 關網 的、方法,抓 get-pip.py 來run 的話,安裝的 pip。
當 pip install imageio 時,正確的安裝 numpy1.16。
所以 在 python2.7 下OK

但是用 apt-get install python-pip 安裝的 pip,在用 pip install imageio 時,就會出現上面的問題。

2019/9/11

Dart 的一些 link..

Dart 的語法部份,follow language tour 照著key 一次就可以。
還可以用 dartpad 在 browser 上編輯執行,不用安裝。
最後的 Asynchrony 好像是Dart 特殊的部份。
當然,還有一些'方便'的設計,像..
  • => expression 就是 { return expression; }
  • .. cascade,省掉暫時物件命名的麻煩 (一堆 tmpvar)
  • ?. 省掉 object 是不是 null 的檢查

gcc - extend asm

因為這次clag的change not 有提到,增加 asm-goto 的 support,所以可以用來 compile kernel。
就查一下 asm-goto。

發現 gcc document 已經寫得很好了。
關於 C code 中使用 asm 來寫assembly code,這種特殊的形式,終於有清楚的說明了。

2019/8/13

caffe train snapshot and resume

resume training (with --snapshot=XXX.solvestate) 之後,在save snapshot 之後,出現 Error:
I0813 15:10:02.564018 12537 solver.cpp:635] Iteration 6000, Testing net (#0)
F0813 15:10:02.564225 12537 net.cpp:1081] Check failed: target_blobs[j]->shape() == source_blob->shape() Cannot share param 0 weights from layer 'conv1a/bn'; shape mismatch.  Source param shape is 1 32 1 1 (32); target param shape is 32 (32)
*** Check failure stack trace: ***
    @     0x7fe99628b0cd  google::LogMessage::Fail()
    @     0x7fe99628cf33  google::LogMessage::SendToLog()
    @     0x7fe99628ac28  google::LogMessage::Flush()
    @     0x7fe99628d999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fe996e9b28a  caffe::Net::ShareTrainedLayersWith()
    @     0x7fe99729c883  caffe::Solver::TestDetection()
    @     0x7fe99729f91d  caffe::Solver::TestAll()
    @     0x7fe9972a06ac  caffe::Solver::Step()
    @     0x7fe9972a1fd2  caffe::Solver::Solve()
    @     0x555c3341fe4e  train()
    @     0x555c3341cdb1  main
    @     0x7fe9948b9b97  __libc_start_main
    @     0x555c3341db4a  _start
但是如果不用 resume (--snapshot)的話,training 會 success.

2019/8/12

caffe-jacindo-models train_image_object_detection.sh

續上篇,把 training data 準備好後, run 這個 script 就會做 train 了。
這個 script 大概就是..

呼叫 image_object_detection.py ,準備好參數,產生五個 run.sh:
./initial/run.sh
./l1reg/run.sh
./sparse/run.sh
./test/run.sh
./test_quantize/run.sh

然後依序 run..
#run
list_dirs=`command ls -d1 "$folder_name"/*/ | command cut -f5 -d/`
for f in $list_dirs; do "$folder_name"/$f/run.sh; done
所以會依照 folder name 順序 (alphabet) 執行 run.sh

每一個 run.sh 就是 call caffe.bin,並且上一個 stage 的 run.sh 產生的 *.caffemodel 當作 initial weights,
依照自己的 solver.txt 做 taining,產生 *.caffemodel

根據 solver.prototxt , 每 2000 iter snapshot 一次。(caffemodel)

所以在 run train_image_object_detection.sh 中圖中斷的話。
可以看看training/,,,,/ 下,以上各 folder 的內容,知道最後 run 那一個 script。


每 200 iter snapshot 一次,所以中斷後,可以用 --snapshot 參數從最新的 snapshoot 繼續 training.
參考:Training and Resuming

但是就要修改 run.sh..
舉例來說,initial stage, iter 16000 開始的話...
--- run.sh 2019-08-12 13:49:23.741507855 +0800
+++ resume.sh 2019-08-13 09:44:19.540395008 +0800
@@ -1,4 +1,4 @@
 /home/checko/caffe-jacinto/build/tools/caffe.bin train \
 --solver="training/voc0712/JDetNet/20190812_13-49_ds_PSP_dsFac_32_hdDS8_1/initial/solver.prototxt" \
---weights="training/imagenet_jacintonet11v2_iter_320000.caffemodel" \
+--snapshot="training/voc0712/JDetNet/20190812_13-49_ds_PSP_dsFac_32_hdDS8_1/initial/voc0712_ssdJacintoNetV2_iter_16000.solverstate" \
 --gpu "0" 2>&1 | tee training/voc0712/JDetNet/20190812_13-49_ds_PSP_dsFac_32_hdDS8_1/initial/run.log
就是原來的 weights 改成 snapshot



train 完後,要測試 (infernal), run 或 import ssd_detect_video.py,會出現找不到 propagate_obj.py 的錯誤。
用 git log 看,在某一版被刪除了。
git checkout 6ca88ff12e559a839ae5ee9bc7c25201f0ed9217 scripts/propagate_obj.py
就可以找回來了。

2019/7/25

caffe-jacindo-models

先 follow 這一篇 README.md
clone 下來,跟 caffe-jacindo 在同一層:
 
 ./
 |
  -- caffe-jacindo
 |
 |
  -- caffe-jacindo-models
clone 後,checkout 跟 caffe-jacindo 一樣的版本(branch): caffe-0.17

設定環境變數:
export PYTHONPATH=~/caffe-jacinto/python
export CAFFE_ROOT=~/caffe-jacinto
這樣就裝完了。
接下來依照需要做的 example 來 run script..

ref:
SSD

scripts/train_image_object_detection.sh 就是用 caffe-ssd 的training data 做 SSD training 的 script。
可以知道,他用 caffe-ssd 的 training data:
if [ $dataset = "voc0712" ]
then
  train_data="../../caffe-jacinto/examples/VOC0712/VOC0712_trainval_lmdb"
  test_data="../../caffe-jacinto/examples/VOC0712/VOC0712_test_lmdb"

  name_size_file="../../caffe-jacinto/data/VOC0712/test_name_size.txt"
  label_map_file="../../caffe-jacinto/data/VOC0712/labelmap_voc.prototxt"

所以要先 follow caffe-ssd 的步驟,
download training data 到 ~/data
在 caffe-jacinto 中run create_data.sh, create_list.sh 把 VOC0712 的 data set 準備好。
chmod a+x data/VOC0712/*.sh
./data/VOC0712/create_list.sh
./data/VOC0712/create_data.sh
然後到 caffe-jacinto-models 下 run train_image_object_detection.sh. 根據 train_image_object_detection.sh 的內容,要在 scripts 目錄下執行。
結果:
2019-07-26 11:39:44 (10.2 MB/s) - ‘training/imagenet_jacintonet11v2_iter_320000.caffemodel’ saved [11516054/11516054]

Logging output to training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/train-log_20190726_11-39.txt
training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/initial
Traceback (most recent call last):
  File "./models/image_object_detection.py", line 5, in 
    from models.model_libs import *
ImportError: No module named models.model_libs
training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/l1reg
Traceback (most recent call last):
  File "./models/image_object_detection.py", line 5, in 
    from models.model_libs import *
ImportError: No module named models.model_libs
training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/sparse
Traceback (most recent call last):
  File "./models/image_object_detection.py", line 5, in 
    from models.model_libs import *
ImportError: No module named models.model_libs
training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test
Traceback (most recent call last):
  File "./models/image_object_detection.py", line 5, in 
    from models.model_libs import *
ImportError: No module named models.model_libs
training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test_quantize
Traceback (most recent call last):
  File "./models/image_object_detection.py", line 5, in 
    from models.model_libs import *
ImportError: No module named models.model_libs
cat: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test_quantize/deploy.prototxt: No such file or directory
cat: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test_quantize/test.prototxt: No such file or directory
./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/initial/run.sh: No such file or directory
./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/l1reg/run.sh: No such file or directory
./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/sparse/run.sh: No such file or directory
./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test/run.sh: No such file or directory
./train_image_object_detection.sh: line 382: training/voc0712/JDetNet/20190726_11-39_ds_PSP_dsFac_32_hdDS8_1/test_quantize/run.sh: No such file or directory
好像是 module path 的問題。
image_object_detection.py 在 import 同目錄的 py module時,都加上了models
把models 刪掉就沒有這個 Error 了。

然後是 Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
一樣,把 scripts 下所有*.sh, *.py 中的 gpus= 找出來,都改成 gpus="0" 就可以了。
例如:
diff --git a/scripts/train_image_object_detection.sh b/scripts/train_image_object_detection.sh
index 0e5f99d..f9771f4 100755
--- a/scripts/train_image_object_detection.sh
+++ b/scripts/train_image_object_detection.sh
@@ -5,7 +5,7 @@ DATE_TIME=`date +'%Y%m%d_%H-%M'`
 #-------------------------------------------------------
 
 #------------------------------------------------
-gpus="0,1" #"0,1,2"
+gpus="0" #"0,1,2"


diff --git a/scripts/models/image_object_detection.py b/scripts/models/image_object_detection.py
index 67823a1..f625615 100644
--- a/scripts/models/image_object_detection.py
+++ b/scripts/models/image_object_detection.py
@@ -360,7 +360,7 @@ def main():
     # Which layers to freeze (no backward) during training.
     config_param.freeze_layers = []
     # Defining which GPUs to use.
-    config_param.gpus = "0,1" #gpus = "0"  
+    config_param.gpus = "0" #gpus = "0"  
 
     config_param.batch_size = 32
     config_param.accum_batch_size = 32

我把修改過的放到 https://github.com/checko/caffe-jacinto-models/tree/zoey-wpc
還有要注意 GPU 的 memory。
train_image_object_detection.sh 中的 batch_size 是 16,給 8G 的 GPU card 剛好
如果 GPU memory 只有 4G,要改成 8

2019/7/23

caffe-jacinto

make 就出現..
  ^
AR -o .build_release/lib/libcaffe-nv.a
LD -o .build_release/lib/libcaffe-nv.so.0.17.0
/usr/bin/ld: cannot find -lnvidia-ml
collect2: error: ld returned 1 exit status
Makefile:600: recipe for target '.build_release/lib/libcaffe-nv.so.0.17.0' failed
make: *** [.build_release/lib/libcaffe-nv.so.0.17.0] Error 1
ref:ubuntu NVIDIA-SMI couldn't find libnvidia-ml.so library in your system.
< 找一下libnvidia-ml.so 在哪裡...
$ locate libnvidia-ml.so
/usr/lib/nvidia-410/libnvidia-ml.so
/usr/lib/nvidia-410/libnvidia-ml.so.1
/usr/lib/nvidia-410/libnvidia-ml.so.410.104
/usr/lib32/nvidia-410/libnvidia-ml.so
/usr/lib32/nvidia-410/libnvidia-ml.so.1
/usr/lib32/nvidia-410/libnvidia-ml.so.410.104
/usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/usr/local/cuda-10.1/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
然後依照這一篇檢查 ld.so.conf.d

cuda-10.0.config 的設定是正常的:
$ cat /etc/ld.so.conf.d/cuda-10-0.conf 
/usr/local/cuda-10.0/targets/x86_64-linux/lib
/usr/lib/nvidia-410
有兩個 libnvidia-ml.so 的原因,這一篇 說得很清楚。

結果一樣,還是 fail。用 make -n看一下這個 ld 的完整option..
$ make -n .build_release/lib/libcaffe-nv.so.0.17.0
echo LD -o .build_release/lib/libcaffe-nv.so.0.17.0
g++ -shared -o .build_release/lib/libcaffe-nv.so.0.17.0 .build_release/src/caffe/proto/caffe.pb.o .build_release/src/caffe/net.o
.build_release/src/caffe/layer_factory.o .build_release/src/caffe/quantized_layer.o .build_release/src/caffe/tensor.o 
.build_release/src/caffe/layers/sigmoid_cross_entropy_loss_layer.o .build_release/src/caffe/layers/reshape_layer.o 
.build_release/src/caffe/layers/cudnn_dropout_layer.o .build_release/src/caffe/layers/hdf5_output_layer.o 
.build_release/src/caffe/layers/reduction_layer.o .build_release/src/caffe/layers/tanh_layer.o 
.build_release/src/caffe/layers/sigmoid_layer.o .build_release/src/caffe/layers/smooth_L1_loss_layer.o 
.build_release/src/caffe/layers/cudnn_tanh_layer.o .build_release/src/caffe/layers/silence_layer.o 
.build_release/src/caffe/layers/bias_layer.o .build_release/src/caffe/layers/detection_output_layer.o 
.build_release/src/caffe/layers/normalize_layer.o .build_release/src/caffe/layers/slice_layer.o 
.build_release/src/caffe/layers/pooling_layer.o .build_release/src/caffe/layers/detection_evaluate_layer.o 
.build_release/src/caffe/layers/relu_layer.o .build_release/src/caffe/layers/split_layer.o 
.build_release/src/caffe/layers/batch_reindex_layer.o .build_release/src/caffe/layers/cudnn_sigmoid_layer.o 
.build_release/src/caffe/layers/conv_layer.o .build_release/src/caffe/layers/bnll_layer.o 
.build_release/src/caffe/layers/base_conv_layer.o .build_release/src/caffe/layers/cudnn_softmax_layer.o 
.build_release/src/caffe/layers/crop_layer.o .build_release/src/caffe/layers/multinomial_logistic_loss_layer.o 
.build_release/src/caffe/layers/cudnn_lcn_layer.o .build_release/src/caffe/layers/base_data_layer.o 
.build_release/src/caffe/layers/rnn_layer.o .build_release/src/caffe/layers/flatten_layer.o 
.build_release/src/caffe/layers/permute_layer.o .build_release/src/caffe/layers/cudnn_lrn_layer.o 
.build_release/src/caffe/layers/dropout_layer.o .build_release/src/caffe/layers/cudnn_relu_layer.o 
.build_release/src/caffe/layers/hdf5_data_layer.o .build_release/src/caffe/layers/tile_layer.o 
.build_release/src/caffe/layers/mvn_layer.o 
.build_release/src/caffe/layers/lstm_layer.o .build_release/src/caffe/layers/lrn_layer.o 
.build_release/src/caffe/layers/argmax_layer.o 
.build_release/src/caffe/layers/euclidean_loss_layer.o .build_release/src/caffe/layers/python_layer.o 
.build_release/src/caffe/layers/image_label_data_layer.o .build_release/src/caffe/layers/annotated_data_layer.o 
.build_release/src/caffe/layers/cudnn_deconv_layer.o .build_release/src/caffe/layers/cudnn_batch_norm_layer.o 
.build_release/src/caffe/layers/contrastive_loss_layer.o .build_release/src/caffe/layers/exp_layer.o 
.build_release/src/caffe/layers/memory_data_layer.o .build_release/src/caffe/layers/log_layer.o 
.build_release/src/caffe/layers/accuracy_layer.o .build_release/src/caffe/layers/neuron_layer.o 
.build_release/src/caffe/layers/dummy_data_layer.o .build_release/src/caffe/layers/l1_loss_layer.o 
.build_release/src/caffe/layers/segmentation_accuracy_layer.o .build_release/src/caffe/layers/video_data_layer.o 
.build_release/src/caffe/layers/softmax_layer.o .build_release/src/caffe/layers/image_data_layer.o 
.build_release/src/caffe/layers/absval_layer.o .build_release/src/caffe/layers/deconv_layer.o 
.build_release/src/caffe/layers/detectnet_transform_layer.o .build_release/src/caffe/layers/eltwise_layer.o 
.build_release/src/caffe/layers/loss_layer.o .build_release/src/caffe/layers/input_layer.o 
.build_release/src/caffe/layers/infogain_loss_layer.o .build_release/src/caffe/layers/inner_product_layer.o 
.build_release/src/caffe/layers/hinge_loss_layer.o .build_release/src/caffe/layers/concat_layer.o 
.build_release/src/caffe/layers/multibox_loss_layer.o .build_release/src/caffe/layers/data_layer.o 
.build_release/src/caffe/layers/embed_layer.o .build_release/src/caffe/layers/lstm_unit_layer.o 
.build_release/src/caffe/layers/recurrent_layer.o .build_release/src/caffe/layers/elu_layer.o 
.build_release/src/caffe/layers/scale_layer.o .build_release/src/caffe/layers/cudnn_pooling_layer.o 
.build_release/src/caffe/layers/filter_layer.o .build_release/src/caffe/layers/cudnn_conv_layer.o 
.build_release/src/caffe/layers/im2col_layer.o .build_release/src/caffe/layers/threshold_layer.o 
.build_release/src/caffe/layers/prelu_layer.o .build_release/src/caffe/layers/power_layer.o 
.build_release/src/caffe/layers/spp_layer.o 
.build_release/src/caffe/layers/batch_norm_layer.o .build_release/src/caffe/layers/window_data_layer.o 
.build_release/src/caffe/layers/axpy_layer.o .build_release/src/caffe/layers/softmax_loss_layer.o 
.build_release/src/caffe/layers/prior_box_layer.o .build_release/src/caffe/blob.o .build_release/src/caffe/solver.o 
.build_release/src/caffe/syncedmem.o .build_release/src/caffe/util/gpu_memory.o .build_release/src/caffe/util/io.o 
.build_release/src/caffe/util/im_transforms.o .build_release/src/caffe/util/bbox_util.o .build_release/src/caffe/util/benchmark.o 
.build_release/src/caffe/util/sampler.o .build_release/src/caffe/util/cudnn.o .build_release/src/caffe/util/db_lmdb.o 
.build_release/src/caffe/util/hdf5.o .build_release/src/caffe/util/db.o .build_release/src/caffe/util/upgrade_proto.o 
.build_release/src/caffe/util/detectnet_coverage_rectangular.o .build_release/src/caffe/util/signal_handler.o 
.build_release/src/caffe/util/blocking_queue.o .build_release/src/caffe/util/db_leveldb.o 
.build_release/src/caffe/util/im2col.o 
.build_release/src/caffe/util/math_functions.o .build_release/src/caffe/util/insert_splits.o 
.build_release/src/caffe/solvers/nesterov_solver.o .build_release/src/caffe/solvers/adam_solver.o 
.build_release/src/caffe/solvers/sgd_solver.o .build_release/src/caffe/solvers/rmsprop_solver.o 
.build_release/src/caffe/solvers/adagrad_solver.o .build_release/src/caffe/solvers/adadelta_solver.o 
.build_release/src/caffe/data_transformer.o .build_release/src/caffe/parallel.o .build_release/src/caffe/common.o 
.build_release/src/caffe/batch_transformer.o .build_release/src/caffe/internal_thread.o .build_release/src/caffe/layer.o 
.build_release/src/caffe/data_reader.o .build_release/src/caffe/type.o .build_release/cuda/src/caffe/layers/filter_layer.o 
.build_release/cuda/src/caffe/layers/cudnn_tanh_layer.o .build_release/cuda/src/caffe/layers/silence_layer.o 
.build_release/cuda/src/caffe/layers/softmax_loss_layer.o .build_release/cuda/src/caffe/layers/log_layer.o 
.build_release/cuda/src/caffe/layers/sigmoid_layer.o .build_release/cuda/src/caffe/layers/recurrent_layer.o 
.build_release/cuda/src/caffe/layers/prelu_layer.o .build_release/cuda/src/caffe/layers/elu_layer.o 
.build_release/cuda/src/caffe/layers/accuracy_layer.o .build_release/cuda/src/caffe/layers/deconv_layer.o 
.build_release/cuda/src/caffe/layers/exp_layer.o .build_release/cuda/src/caffe/layers/cudnn_deconv_layer.o 
.build_release/cuda/src/caffe/layers/cudnn_lcn_layer.o .build_release/cuda/src/caffe/layers/cudnn_softmax_layer.o 
.build_release/cuda/src/caffe/layers/sigmoid_cross_entropy_loss_layer.o .build_release/cuda/src/caffe/layers/dropout_layer.o 
.build_release/cuda/src/caffe/layers/crop_layer.o .build_release/cuda/src/caffe/layers/threshold_layer.o 
.build_release/cuda/src/caffe/layers/im2col_layer.o .build_release/cuda/src/caffe/layers/embed_layer.o 
.build_release/cuda/src/caffe/layers/smooth_L1_loss_layer.o .build_release/cuda/src/caffe/layers/lstm_unit_layer.o 
.build_release/cuda/src/caffe/layers/slice_layer.o .build_release/cuda/src/caffe/layers/cudnn_relu_layer.o 
.build_release/cuda/src/caffe/layers/normalize_layer.o .build_release/cuda/src/caffe/layers/conv_layer.o 
.build_release/cuda/src/caffe/layers/contrastive_loss_layer.o .build_release/cuda/src/caffe/layers/inner_product_layer.o 
.build_release/cuda/src/caffe/layers/hdf5_output_layer.o .build_release/cuda/src/caffe/layers/cudnn_conv_layer.o 
.build_release/cuda/src/caffe/layers/bnll_layer.o .build_release/cuda/src/caffe/layers/hdf5_data_layer.o 
.build_release/cuda/src/caffe/layers/relu_layer.o .build_release/cuda/src/caffe/layers/cudnn_sigmoid_layer.o 
.build_release/cuda/src/caffe/layers/power_layer.o .build_release/cuda/src/caffe/layers/lrn_layer.o 
.build_release/cuda/src/caffe/layers/cudnn_batch_norm_layer.o .build_release/cuda/src/caffe/layers/tile_layer.o 
.build_release/cuda/src/caffe/layers/cudnn_pooling_layer.o .build_release/cuda/src/caffe/layers/euclidean_loss_layer.o 
.build_release/cuda/src/caffe/layers/bias_layer.o .build_release/cuda/src/caffe/layers/mvn_layer.o 
.build_release/cuda/src/caffe/layers/tanh_layer.o .build_release/cuda/src/caffe/layers/concat_layer.o 
.build_release/cuda/src/caffe/layers/split_layer.o .build_release/cuda/src/caffe/layers/detectnet_transform_layer.o 
.build_release/cuda/src/caffe/layers/absval_layer.o .build_release/cuda/src/caffe/layers/cudnn_lrn_layer.o 
.build_release/cuda/src/caffe/layers/detection_output_layer.o .build_release/cuda/src/caffe/layers/softmax_layer.o 
.build_release/cuda/src/caffe/layers/permute_layer.o .build_release/cuda/src/caffe/layers/base_data_layer.o 
.build_release/cuda/src/caffe/layers/l1_loss_layer.o .build_release/cuda/src/caffe/layers/reduction_layer.o 
.build_release/cuda/src/caffe/layers/scale_layer.o .build_release/cuda/src/caffe/layers/eltwise_layer.o 
.build_release/cuda/src/caffe/layers/batch_reindex_layer.o .build_release/cuda/src/caffe/layers/pooling_layer.o 
.build_release/cuda/src/caffe/layers/axpy_layer.o .build_release/cuda/src/caffe/layers/batch_norm_layer.o 
.build_release/cuda/src/caffe/layers/cudnn_dropout_layer.o .build_release/cuda/src/caffe/util/math_functions2.o 
.build_release/cuda/src/caffe/util/gpu_amax.o .build_release/cuda/src/caffe/util/gpu_sumsq.o 
.build_release/cuda/src/caffe/util/gpu_asum.o 
.build_release/cuda/src/caffe/util/im2col.o .build_release/cuda/src/caffe/util/bbox_util.o 
.build_release/cuda/src/caffe/util/math_functions.o .build_release/cuda/src/caffe/solvers/nesterov_solver.o 
.build_release/cuda/src/caffe/solvers/adagrad_solver.o .build_release/cuda/src/caffe/solvers/adam_solver.o 
.build_release/cuda/src/caffe/solvers/sgd_solver.o .build_release/cuda/src/caffe/solvers/rmsprop_solver.o 
.build_release/cuda/src/caffe/solvers/adadelta_solver.o .build_release/cuda/src/caffe/quantized_layer.o 
.build_release/cuda/src/caffe/data_transformer.o -Wl,-soname,libcaffe-nv.so.0.17 -Wl,-rpath,\$ORIGIN/../lib -pthread -fPIC -
DCAFFE_VERSION=0.17.0 -std=c++11 -DCUDA_NO_HALF -DNDEBUG -O2 -DUSE_CUDNN -DUSE_OPENCV -DUSE_LEVELDB -DUSE_LMDB -DWITH_PYTHON_LAYER -
I/usr/include/python2.7 -I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/local/include -I/usr/include/hdf5/serial -
I.build_release/src -I./src -I./include -I./3rdparty -I/usr/include/hdf5/serial -I/usr/local/cuda/include -I/cuda/include -I/include -
I/opt/OpenBLAS/include/ -Wall -Wno-sign-compare -L/usr/lib -L/usr/local/lib -L/usr/lib -L/usr/lib/x86_64-linux-gnu/hdf5/serial -
L/usr/local/cuda/lib64 -L/usr/lib/nvidia-396 -L/usr/lib/nvidia-390 -L/usr/lib/nvidia-387 -L/usr/lib/nvidia-384 -L/usr/lib/nvidia-381 -
L/usr/lib/nvidia-375 -L/usr/lib/nvidia-367 -L/usr/lib/nvidia-361 -L/usr/lib/nvidia-352 -L/usr/local/cuda/lib -L/cuda/lib64 -L/lib64 -
L/usr/lib/x86_64-linux-gnu/hdf5/serial -L/opt/OpenBLAS/lib/ -L.build_release/lib -L/usr/local/lib -lopencv_stitching -lopencv_superres -
lopencv_videostab -lopencv_aruco -lopencv_bgsegm -lopencv_bioinspired -lopencv_ccalib -lopencv_dnn -lopencv_dpm -lopencv_fuzzy -
lopencv_line_descriptor -lopencv_optflow -lopencv_plot -lopencv_reg -lopencv_saliency -lopencv_stereo -lopencv_structured_light -
lopencv_rgbd -lopencv_surface_matching -lopencv_tracking -lopencv_datasets -lopencv_text -lopencv_face -lopencv_xfeatures2d -lopencv_shape 
-lopencv_video -lopencv_ximgproc -lopencv_calib3d -lopencv_features2d -lopencv_flann -lopencv_xobjdetect -lopencv_objdetect -lopencv_ml -
lopencv_xphoto -lippicv -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_photo -lopencv_imgproc -lopencv_core -lcudart -
lcublas -lcurand -lnvidia-ml -lboost_system -lglog -lgflags -lprotobuf -lboost_filesystem -lm -lturbojpeg -lhdf5_hl -lhdf5 -lleveldb -
lsnappy -llmdb -lopencv_core -lopencv_highgui -lopencv_imgproc -lboost_thread -lboost_regex -lstdc++ -lcudnn -lboost_python-py27 -
lpython2.7 -lboost_regex -lopenblas 

cd .build_release/lib; rm -f libcaffe-nv.so.0.17; ln -s libcaffe-nv.so.0.17.0 libcaffe-nv.so.0.17

cd .build_release/lib; rm -f libcaffe-nv.so;   ln -s libcaffe-nv.so.0.17 libcaffe-nv.so

所以他沒有吃標準的ld path..
只好手動修改 Makefile..
diff --git a/Makefile b/Makefile
index 858886c..6ce5dca 100644
--- a/Makefile
+++ b/Makefile
@@ -179,7 +179,7 @@ CUDA_LIB_DIR :=
 # add /lib64 only if it exists
 ifneq ("$(wildcard $(CUDA_DIR)/lib64)","")
        CUDA_LIB_DIR += $(CUDA_DIR)/lib64
-       CUDA_LIB_DIR += /usr/lib/nvidia-396 /usr/lib/nvidia-390 /usr/lib/nvidia-387 /usr/lib/nvidia-384 /usr/lib/nvidia-381 /usr/lib/nvidia-375 /usr/lib/nvidia-367 /usr/lib/nvidia-361 /usr/lib/nvidia-352
+       CUDA_LIB_DIR += /usr/lib/nvidia-410 /usr/lib/nvidia-390 /usr/lib/nvidia-387 /usr/lib/nvidia-384 /usr/lib/nvidia-381 /usr/lib/nvidia-375 /usr/lib/nvidia-367 /usr/lib/nvidia-361 /usr/lib/nvidia-352
 endif
 CUDA_LIB_DIR += $(CUDA_DIR)/lib
 
這樣就 build OK 了..


make runtest 時有錯:
[==========] 2101 tests from 283 test cases ran. (442196 ms total)
[  PASSED  ] 2097 tests.
[  FAILED  ] 4 tests, listed below:
[  FAILED  ] LayerFactoryTest/0.TestCreateLayer, where TypeParam = caffe::CPUDevice
[  FAILED  ] LayerFactoryTest/1.TestCreateLayer, where TypeParam = caffe::CPUDevice
[  FAILED  ] LayerFactoryTest/2.TestCreateLayer, where TypeParam = caffe::GPUDevice
[  FAILED  ] LayerFactoryTest/3.TestCreateLayer, where TypeParam = caffe::GPUDevice

 4 FAILED TESTS
Makefile:560: recipe for target 'runtest' failed
make: *** [runtest] Error 1

ssd

ref:
make -j10
make pycaffe
要 make pycaffe,才會有 python/caffe/_caffe.so

完成後,還要設定 環境變數 PYTHONPATH 到 python/caffe,python 才能 import 得到 caffe.

ssd 的修改版本,已經把 ssd 加到 example 中。
examples/ssd/ssd_pascal.py 是產生 network, solve, deploy 的 python script 。
他會依照 hard coded 的 layer 寫出 run 的 script...
/jobs/VGGNet/VOC0712/SSD_300x300$ cat VGG_VOC0712_SSD_300x300.sh 
cd /home/charles-chang/caffessd
./build/tools/caffe train \
--solver="models/VGGNet/VOC0712/SSD_300x300/solver.prototxt" \
--weights="models/VGGNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel" \
--gpu 0 2>&1 | tee jobs/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300.log
和這個script 需要的 prototxt.

ssd_pascal.py 中, run_soon= True 的話,在 create 完需要的net file, shell script 後,
就會直接 call subprocess.call( ) 來 run 這個 script,進行 training.

依照說明,到github 把 .caffemodel下載下來。
放到 create models/VGGNet folder 下。
這是 pretrained 好 converlution layer 的 parameters.

接著依照說明download training data 到 ~/data 下。(因為 ssd_pascal.py hard code path)
run caffe/data/VOC0712/ 下的 create_list.sh, create_data.sh。
他會把 ~/data/ 下的 資料放到 lmdb 中。
create_list.sh, create_data.sh 要照順序做,create_data.sh 最後會把 list 跟 data 合在一起,並且在 example/VOC0712/ 下建立 link 到 ~/data/VOC....
example/VOC0712/ 下的 link 才是 等一下 train 的 reference data path (ref: train.prototxt)

之後,就可以 run ssd_pascal.py

因為 hard coded, gpu 數量,所以會出現Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal
依照說明改 GPU 可使用數目。
同樣,會出現 out of memory,修改 batch_size, 和 accum_batch_size。
diff --git a/examples/ssd/ssd_pascal.py b/examples/ssd/ssd_pascal.py
index 62129ba..1de3c5b 100644
--- a/examples/ssd/ssd_pascal.py
+++ b/examples/ssd/ssd_pascal.py
@@ -329,13 +329,13 @@ clip = False
 
 # Solver parameters.
 # Defining which GPUs to use.
-gpus = "0,1,2,3"
+gpus = "0"
 gpulist = gpus.split(",")
 num_gpus = len(gpulist)
 
 # Divide the mini-batch to different GPUs.
-batch_size = 32
-accum_batch_size = 32
+batch_size = 16
+accum_batch_size = 16
 iter_size = accum_batch_size / batch_size
 solver_mode = P.Solver.CPU
 device_id = 0

create_data.sh 最後的 command 是:
/home/checko/caffessd/build/tools/convert_annoset --anno_type=detection --label_type=xml 
--label_map_file=/home/checko/caffessd/data/VOC0712/../../data/VOC0712/labelmap_voc.prototxt 
--check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb 
--shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False 
/home/checko/data/VOCdevkit/ /home/checko/caffessd/data/VOC0712/../../data/VOC0712/test.txt 
/home/checko/data/VOCdevkit/VOC0712/lmdb/VOC0712_test_lmdb
/home/checko/caffessd/build/tools/convert_annoset --anno_type=detection --label_type=xml 
--label_map_file=/home/checko/caffessd/data/VOC0712/../../data/VOC0712/labelmap_voc.prototxt 
--check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb 
--shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False 
/home/checko/data/VOCdevkit/ /home/checko/caffessd/data/VOC0712/../../data/VOC0712/trainval.txt 
/home/checko/data/VOCdevkit/VOC0712/lmdb/VOC0712_trainval_lmdb

train 出現
F0816 17:06:47.531648  3928 math_functions.cpp:250] Check failed: a <= b (0 vs. -1.19209e-07) 
*** Check failure stack trace: ***
    @     0x7fdb6c1a60cd  google::LogMessage::Fail()
    @     0x7fdb6c1a7f33  google::LogMessage::SendToLog()
    @     0x7fdb6c1a5c28  google::LogMessage::Flush()
    @     0x7fdb6c1a8999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fdb6c8b6087  caffe::caffe_rng_uniform<>()
    @     0x7fdb6c8716a8  caffe::SampleBBox()
    @     0x7fdb6c871a00  caffe::GenerateSamples()
    @     0x7fdb6c871c50  caffe::GenerateBatchSamples()
    @     0x7fdb6c9bcdd2  caffe::AnnotatedDataLayer<>::load_batch()
    @     0x7fdb6c953faa  caffe::BasePrefetchingDataLayer<>::InternalThreadEntry()
    @     0x7fdb6ca9c965  caffe::InternalThread::entry()
    @     0x7fdb60df4bcd  (unknown)
    @     0x7fdb4d0f06db  start_thread
    @     0x7fdb6a8e988f  clone
然後這一篇:"caffe 错误 math_functions.cpp Check failed: a <= b" 說改 SampleBBox()..

2019/6/26

/etc/profile.d

原來login 之後, shell 會先 run /etc/profile。
然後 /etc/profile 最後有一個 loop,會去 run /etc/profile.d 下所有的 script。
所以,有個別的全域的參數要設定,就可以放在 /etc/profile.d/ 下。

舉例來說,安裝完 anaconda 後,在 $(INSTALLPATH)/etc/profile.d/ 會有 conda.sh
可以建一個 link 到 /etc/profile.d/conda.sh
這樣,所有 user login 後,都會自動執行 anaconda 的環境設定 script

2019/6/21

disable touchpad

:~$ xinput list
⎡ Virtual core pointer                     id=2 [master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer               id=4 [slave  pointer  (2)]
⎜   ↳ Kensington      Kensington Expert Mouse  id=10 [slave  pointer  (2)]
⎜   ↳ SynPS/2 Synaptics TouchPad               id=13 [slave  pointer  (2)]
⎜   ↳ TPPS/2 IBM TrackPoint                    id=14 [slave  pointer  (2)]
⎣ Virtual core keyboard                    id=3 [master keyboard (2)]
    ↳ Virtual core XTEST keyboard              id=5 [slave  keyboard (3)]
    ↳ Power Button                             id=6 [slave  keyboard (3)]
    ↳ Video Bus                                id=7 [slave  keyboard (3)]
    ↳ Video Bus                                id=8 [slave  keyboard (3)]
    ↳ Power Button                             id=9 [slave  keyboard (3)]
    ↳ Integrated Camera                        id=11 [slave  keyboard (3)]
    ↳ AT Translated Set 2 keyboard             id=12 [slave  keyboard (3)]
    ↳ ThinkPad Extra Buttons                   id=15 [slave  keyboard (3)]
:~$ xinput --disable 13
先用xinput 列出所有 input device,然後用 disable 關掉 touchpad

在使用wayland 後這個 xinput 方式就失效了,沒有出現我用的 touchpad 名稱。
所以參考這一篇 歷經 11 年後的回答,用gnome setting:

Disable:
gsettings set org.gnome.desktop.peripherals.touchpad send-events disabled

Enable:
gsettings set org.gnome.desktop.peripherals.touchpad send-events enabled

2019/6/19

python3 and tensorflow 1.X

python3 -m venv env
source env/bin/activate
pip install tensorflow==1.3
不指定 1.3 的話,install 會有一堆相容問題。
不用 venv 的話,會裝到系統目錄,問題更大。


在 ubuntu 18.04,python3 (3.6.9),pip3 安裝 tensorflow==1.14.0,會出現 Error:
protobuf need python > 3.7
這個問題升級 pip3 後就 OK

升級pip3 的方法:
python -m pip install --upgrade pip
實際上,因為沒有sudo,這個command 會install 新版 pip 到 ~/.local/bin 下。
要重新login 才會 invoke 到。

要更新 pip3 的話,不知道用什麼方法,所以:
python3 -m pip install --upgrade pip
結果在 ~/.local/bin 下產生了 pip3, pip3.6,同時 pip 內容也被改成 python3.6 了。
為了讓 pip 回到 python2.7,把 .local/bin 下的 pip2 copy 成 pip

好像用...
pip3 install --upgrade pip