2019/11/26

'+' mark at the end of file attribute

drwxrwxr-x  30 aiden aiden      4096 Nov 24 22:18 tmp
drwxrwxr-x+ 30 aiden aiden      4096 Nov 25 00:30 aosp
-rw-r-----+  1 aiden aiden         0 Nov 25 14:39 1234.txt
像這樣,有些目錄/檔案,在 attrib 最後是 '+'

根據 + or @ after ls -al:
The + suffix means the file has an access control list, and is common in any *nix that supports ACLs. Giving ls the -e flag will make it show the associated ACLs after the file, and chmod can be used to modify then. Most of this is from the chmod man page:

2019/11/25

mkfs.fat 4.1 -- reseved sector = 1

ref:fat filesystem
Name             Offset
---------------------------
JmpBoot       :  0
OEMName       :  3
Bytes Per Sec : 11
Sec Per Clust : 13
Reserv Sec Cnt: 14
Num FATs      : 16
所以 Reserved Sector Count 在第 14 byte:

Number of sectors in reserved area. This field must not be 0 because there is the boot sector itself contains this BPB in the reserved area. To avoid compatibility problems, it should be 1 on FAT12/16 volume. This is because some old FAT drivers ignore this field and assume that the size of reserved area is 1. On the FAT32 volume, it is typically 32. Microsoft's OS properly supports any value of 1 or larger.
fat16copy.py,開image 檔的部份:
   f = open(path, "r+b")

    self.f = f

    f.seek(0xb)
    bytes_per_sector = read_le_short(f)
    print(bytes_per_sector);
    sectors_per_cluster = read_byte(f)
    print(sectors_per_cluster);

    self.bytes_per_cluster = bytes_per_sector * sectors_per_cluster

    reserved_sectors = read_le_short(f)
    print(reserved_sectors);
    assert reserved_sectors == 1, \
        "Can only handle FAT with 1 reserved sector"

reserved_sector 是 ..0x0b+2(short)+1(byte) = 0x0e, 沒錯。

用 hd 配合mkfs.fat 看看..
~$ mkfs.fat 500M
mkfs.fat 4.1 (2017-01-24)
charles-chang@zoeymkII:~$ hd -n 16 500M 
00000000  eb 3c 90 6d 6b 66 73 2e  66 61 74 00 02 10 10 00  |.<.mkfs.fat.....|
00000010
charles-chang@zoeymkII:~$ mkfs.fat -a -R1 500M
mkfs.fat 4.1 (2017-01-24)
charles-chang@zoeymkII:~$ hd -n 16 500M 
00000000  eb 3c 90 6d 6b 66 73 2e  66 61 74 00 02 10 01 00  |.<.mkfs.fat.....|
00000010
所以mkfs.fat 4.1 版,加上參攝 -a -R1 這兩個 option:
  • -a : disable alian
  • -R1 : reserved sector = 1
可以讓 4.1 版也正確做出 reserved_sector=1 的 FAT image

用 fat16copy.py 測試 OK

2019/11/20

build android P for rpi3

依照這個 來做,試試 rpi 3 上 android P

android sources from google.(https://android.googlesource.com/platform/manifest)
依照著clone device/brobwind/rpi3,增加local_manifest, repo sync..
repo init -u https://android.googlesource.com/platform/manifest -b android-9.0.0_r50
repo sync

出現 mkfs.fat Error , 據說是 4.1 版的問題,所以..
build/make$ git diff
diff --git a/tools/fat16copy.py b/tools/fat16copy.py
index c20930a47..18541e88a 100755
--- a/tools/fat16copy.py
+++ b/tools/fat16copy.py
@@ -465,8 +465,8 @@ class fat(object):
     self.bytes_per_cluster = bytes_per_sector * sectors_per_cluster
 
     reserved_sectors = read_le_short(f)
-    assert reserved_sectors == 1, \
-        "Can only handle FAT with 1 reserved sector"
+#    assert reserved_sectors == 1, \
+#        "Can only handle FAT with 1 reserved sector"
 
     fat_count = read_byte(f)
     assert fat_count == 2, "Can only handle FAT with 2 tables"
==> 這樣是沒效的,詳細要參考這邊

然後還有... No module named mako.template
就 apt-get install python-mako

out 目錄用 network , loopback device. clean build 時間:
#### build completed successfully (02:48:25 (hh:mm:ss)) ####
out 目錄總共用掉 54G

因為 out 目錄是 loopback device (disk),所以在 nfs export 時,要加上nohide,crossmnt

燒錄 (SD Card)

到 out/target/product/rpi3 下...
out/target/product/rpi3$ sudo OUT=. ~/zoeymkiihome/pi3p/device/brobwind/rpi3/boot/create_partition_table.sh /dev/sdb
 => Destroy partition table ...

***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory. 
***************************************************************

GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
 => Install GPT partition table ...
 => Install hybrid MBR partition table ...
 => Install images ....
     => Install: rpiboot(./rpiboot.img) image ...
131072+0 records in
131072+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 17.1989 s, 3.9 MB/s
     => Install: boot_a(./boot.img) image ...
39008+0 records in
39008+0 records out
19972096 bytes (20 MB, 19 MiB) copied, 5.31937 s, 3.8 MB/s
     => Install: system_a(./system.img) image ...
1331200+0 records in
1331200+0 records out
681574400 bytes (682 MB, 650 MiB) copied, 152.134 s, 4.5 MB/s
     => Install: misc(/home/charles-chang/zoeymkiihome/pi3p/device/brobwind/rpi3/boot/images/misc.img) image ...
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.336952 s, 3.1 MB/s
     => Install: vendor_a(./vendor.img) image ...
524288+0 records in
524288+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 69.0102 s, 3.9 MB/s
     => Install: oem_bootloader_a(/home/charles-chang/zoeymkiihome/pi3p/device/brobwind/rpi3/boot/images/oem_bootloader_a.img) image ...
8192+0 records in
8192+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 1.17575 s, 3.6 MB/s
     => Install: userdata(/home/charles-chang/zoeymkiihome/pi3p/device/brobwind/rpi3/boot/images/zero_4k.bin) image ...
8+0 records in
8+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0298523 s, 137 kB/s
 => Dump partition table ....
Disk /dev/sdb: 15564800 sectors, 7.4 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): A395EE6C-31A1-4A03-BE8E-6A65F0700662
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 15564766
Partitions will be aligned on 8-sector boundaries
Total free space is 6 sectors (3.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              40          131111   64.0 MiB    FFFF  rpiboot
   2          131112          133159   1024.0 KiB  FFFF  uboot_a
   3          133160          135207   1024.0 KiB  FFFF  uboot_b
   4          135208          200743   32.0 MiB    FFFF  boot_a
   5          200744          266279   32.0 MiB    FFFF  boot_b
   6          266280         1597479   650.0 MiB   FFFF  system_a
   7         1597480         2928679   650.0 MiB   FFFF  system_b
   8         2928680         2928807   64.0 KiB    FFFF  vbmeta_a
   9         2928808         2928935   64.0 KiB    FFFF  vbmeta_b
  10         2928936         2930983   1024.0 KiB  FFFF  misc
  11         2930984         3455271   256.0 MiB   FFFF  vendor_a
  12         3455272         3979559   256.0 MiB   FFFF  vendor_b
  13         3979560         3987751   4.0 MiB     FFFF  oem_bootloader_a
  14         3987752         3995943   4.0 MiB     FFFF  oem_bootloader_b
  15         3995944         4000039   2.0 MiB     FFFF  frp
  16         4000040         4786471   384.0 MiB   FFFF  swap
  17         4786472        15564766   5.1 GiB     FFFF  userdata
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A B4DDDDC3-FF83-4D95-91DC-4999ADB836DF rpiboot
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A E88B3641-48BB-4D5F-892B-A08B075E6E9F uboot_a
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A 8452CDDF-4B7C-42FA-ACB9-1B279553D720 uboot_b
PART: BB499290-B57E-49F6-BF41-190386693794 4ECCB503-5551-490F-B5D5-9D0BDCAC95D7 boot_a
PART: BB499290-B57E-49F6-BF41-190386693794 E579E168-002E-453B-8161-BE9FE76B1390 boot_b
PART: 0F2778C4-5CC1-4300-8670-6C88B7E57ED6 89313297-5363-4E37-B15E-AC87A1F19379 system_a
PART: 0F2778C4-5CC1-4300-8670-6C88B7E57ED6 28A6E00E-C1C1-481A-A087-142A8933B7C8 system_b
PART: B598858A-5FE3-418E-B8C4-824B41F4ADFC 4A773582-3F2D-4ADD-9AF1-4FAD6D78BB7A vbmeta_a
PART: B598858A-5FE3-418E-B8C4-824B41F4ADFC F60C759C-D9EC-4A65-8A24-3DEB5C214204 vbmeta_b
PART: 6B2378B0-0FBC-4AA9-A4F6-4D6E17281C47 BEB9C837-9DF1-4190-BED5-5F9DAD2AF268 misc
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A 1EBD7A2C-2D8C-4CF2-BDED-535219A036AB vendor_a
PART: 314F99D5-B2BF-4883-8D03-E2F2CE507D6A 2039ACAF-EE39-45E2-A02A-5C5927C5AC43 vendor_b
PART: AA3434B2-DDC3-4065-8B1A-18E99EA15CB7 39E6FD6B-58EB-4FB1-BDC5-378B8D888A5E oem_bootloader_a
PART: AA3434B2-DDC3-4065-8B1A-18E99EA15CB7 BB63D919-A323-477E-B4ED-DB62C707C6C0 oem_bootloader_b
PART: AA3434B2-DDC3-4065-8B1A-18E99EA15CB7 AD200506-D3FC-4CB2-AADD-0E1BDABEFB1C frp
PART: AA3434B2-DDC3-4065-8B1A-18E99EA15CB7 59501E98-7479-4BBF-82B6-0585BBFE83EB swap
PART: 0BB7E6ED-4424-49C0-9372-7FBAB465AB4C 9037D9DF-4E2A-4E1F-8C4E-ADBDDBBF2F71 userdata
開機失敗,發現第一個 partition 沒有東西,把 out/.../rpiboot.img mount 起來看,也沒有東西..

18:04:05 prebuilts/build-tools/linux-x86/bin/ninja 
[prebuilts/build-tools/linux-x86/bin/ninja -d keepdepfile rpibootimage -j 10 -f out/combined-rpi3.ninja -v -w dupbuild=err]
[100% 1/1] 
/bin/bash -c "(echo \"Target rpiboot fs image: out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img\" ) 
&& (mkdir -p out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/ ) 
&& (dd if=/dev/zero of=out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img bs=\$((1024*1024)) count=64 ) 
&& (mkfs.fat -n \"rpiboot\" out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img ) 
&& (for item in  out/target/product/rpi3/rpiboot/bootcode.bin out/target/product/rpi3/rpiboot/fixup_cd.dat 
out/target/product/rpi3/rpiboot/fixup.dat 
out/target/product/rpi3/rpiboot/fixup_db.dat 
out/target/product/rpi3/rpiboot/fixup_x.dat 
out/target/product/rpi3/rpiboot/start_cd.elf 
out/target/product/rpi3/rpiboot/start_db.elf 
out/target/product/rpi3/rpiboot/start.elf 
out/target/product/rpi3/rpiboot/start_x.elf 
out/target/product/rpi3/rpiboot/issue.txt 
out/target/product/rpi3/rpiboot/LICENCE.broadcom 
out/target/product/rpi3/rpiboot/LICENSE.oracle 
out/target/product/rpi3/rpiboot/SHA1SUM 
out/target/product/rpi3/rpiboot/cmdline.txt 
out/target/product/rpi3/rpiboot/config.txt 
out/target/product/rpi3/rpiboot/u-boot-dtok.bin 
out/target/product/rpi3/rpiboot/uboot.env 
out/target/product/rpi3/rpiboot/overlays/chosen-serial0.dtbo 
out/target/product/rpi3/rpiboot/overlays/rpi-uart-skip-init.dtbo 
out/target/product/rpi3/rpiboot/bcm2710-rpi-3-b.dtb 
out/target/product/rpi3/rpiboot/bcm2710-rpi-3-b-plus.dtb 
out/target/product/rpi3/rpiboot/overlays/bcm2710-rpi-3-b-android-fstab.dtbo 
out/target/product/rpi3/rpiboot/overlays/bcm2710-rpi-3-b-cpufreq.dtbo 
out/target/product/rpi3/rpiboot/overlays/pwm-2chan.dtbo 
out/target/product/rpi3/rpiboot/overlays/sdtweak.dtbo 
out/target/product/rpi3/rpiboot/overlays/vc4-kms-v3d.dtbo; 
do if [ \"\`dirname \${item}\`\" = \"out/target/product/rpi3/rpiboot\" ] ;
 then build/make/tools/fat16copy.py out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img \${item} ; 
fi ; done ) 
&& (for item in overlays; do build/make/tools/fat16copy.py 
out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img 
out/target/product/rpi3/rpiboot/\${item} ; done ) 
&& (echo \"Install rpiboot fs image: out/target/product/rpi3/rpiboot.img\" ) 
&& (prebuilts/build-tools/linux-x86/bin/acp out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img out/target/product/rpi3/rpiboot.img )"
Target rpiboot fs image: out/target/product/rpi3/obj/PACKAGING/rpibootimage_intermediates/rpiboot.img
手動把 sd card partition1 mount 起來,把 out/.../rpiboot/ 下的檔案都 copy 過去,就可以開機了。
所以猜是 fat16copy.py 的修改失敗了。

看看mkfs.fat 的 option 有沒有有關 reserved_sector ...
mkfs.fat 4.1 (2017-01-24)
No device specified.
Usage: mkfs.fat [-a][-A][-c][-C][-v][-I][-l bad-block-file][-b backup-boot-sector]
       [-m boot-msg-file][-n volume-name][-i volume-id]
       [-s sectors-per-cluster][-S logical-sector-size][-f number-of-FATs]
       [-h hidden-sectors][-F fat-size][-r root-dir-entries][-R reserved-sectors]
       [-M FAT-media-byte][-D drive_number]
       [--invariant]
       [--help]
       /dev/name [blocks]
也順便看一下 build/make/tools/fat16copy.py,其中 main:
    print("Usage: fat16copy.py   [ ...]")
    print("Files are copied into the root of the image.")
    print("Directories are copied recursively")
看來就是一個很方便直接 copy file 到image file 的 tool,不用 loop mount ..

======== 參考reserved sector = 1
所以就是..
device/brobwind/rpi3$ git diff
diff --git a/build/tasks/rpiboot.mk b/build/tasks/rpiboot.mk
index 10d3d74..871633f 100644
--- a/build/tasks/rpiboot.mk
+++ b/build/tasks/rpiboot.mk
@@ -73,7 +73,7 @@ unique_rpiboot_copy_files_destinations_dirs := $(filter-out .,$(patsubst %/,%,$(
 define build-rpibootimage-target
        mkdir -p $(dir $(1))
        dd if=/dev/zero of=$(1) bs=$$((1024*1024)) count=$(2)
-       mkfs.fat -n "rpiboot" $(1)
+       mkfs.fat -a -R1 -n "rpiboot" $(1)
        for item in $(ALL_INSTALLED_RPIBOOT_FILES); do \
                if [ "`dirname $${item}`" = "$(RPIBOOT_OUT_ROOT)" ] ; then \
                        $(FAT16COPY) $(1) $${item} ; \

2019/11/14

put driver firmware bin file into kernel

看看比較簡單(?) 的 firmware driver : WHITEHEAT
他是 usb serial converter,所以要到 usb client driver 上找,usb ez...
enable 起來後,make log:
make -f /home/charles-chang/mt2712robot/kernel-4.9/scripts/Makefile.build obj=firmware
  FWNAME="whiteheat_loader.fw"; FWSTR="whiteheat_loader_fw"; ASM_WORD=.quad; ASM_ALIGN=3; PROGBITS=@progbits;
 echo "/* Generated by firmware/Makefile */"               > firmware/whiteheat_loader.fw.gen.S;
 echo "    .section .rodata"                               >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .p2align ${ASM_ALIGN}"                  >>firmware/whiteheat_loader.fw.gen.S;
 echo "_fw_${FWSTR}_bin:"                          >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .incbin \"firmware/whiteheat_loader.fw\""                               >>firmware/whiteheat_loader.fw.gen.S;
 echo "_fw_end:"                                   >>firmware/whiteheat_loader.fw.gen.S;
 echo "   .section .rodata.str,\"aMS\",${PROGBITS},1"      >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .p2align ${ASM_ALIGN}"                  >>firmware/whiteheat_loader.fw.gen.S;
 echo "_fw_${FWSTR}_name:"                         >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .string \"$FWNAME\""                    >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .section .builtin_fw,\"a\",${PROGBITS}" >>firmware/whiteheat_loader.fw.gen.S;
 echo "    .p2align ${ASM_ALIGN}"                  >>firmware/whiteheat_loader.fw.gen.S;
 echo "    ${ASM_WORD} _fw_${FWSTR}_name"          >>firmware/whiteheat_loader.fw.gen.S;
 echo "    ${ASM_WORD} _fw_${FWSTR}_bin"           >>firmware/whiteheat_loader.fw.gen.S;
 echo "    ${ASM_WORD} _fw_end - _fw_${FWSTR}_bin" >>firmware/whiteheat_loader.fw.gen.S;
  gcc -Wp,-MD,firmware/.ihex2fw.d -Ifirmware -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89
 -o firmware/ihex2fw /home/charles-chang/mt2712robot/kernel-4.9/firmware/ihex2fw.c
這會把 firmwae 下的 *.HEX 檔轉成 binary 檔,ro 產生一個 section 放置。
這樣就會變成kernel 的一部分,所以用 section name (label) 就可以 access到這個 bin 的位置。

ㄎ 但是這是很不好的方法。

這種方法另一個方式是用 menuconfig, 不用改 kernel source..
ref: Usage and Mechanism of kernel function "request firmware()"

menuconfig -- device driver -- generic .... -- firmware....
有一項就是firmare,因為 firmware dir 的 default 是 firmware,所以直接把 bin 檔放到 firmware/ 下就可以,
然後把 bin 檔的檔名寫在 剛剛 menuconfig 的 FIRMWARE 選項中。

這樣就不用改 kernel Makefile,也不用把 bin 檔轉成hex

2019/11/8

Nvidia DIGITS.

因為nvidia/caffe 關聯到 DIGITS. 所以就用用看..
文件 用的是 nvidia 舊版 docker,
新版的試試看:
~$ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility -itd -p 5000:5000 -v ~/dockerfolder:/dockerfolder nvidia/digits              
然後browser 開啟 ip:5000,果然出現了...
新的 image 好像要從 NGC 拿,有點麻煩...

dataset 要自己 download,然後在 web 界面指定 dataset folder,其他都可以在web page 上完成。
參考這一篇 ,或是 官方,有比較清楚 step by step 的操作。

因為在 client PC 上的browser 要 upload image,所以 run image 時還是指定 volume mapping 好了..

官方有一些 trainning 的 example
也有semantic segmentation 的 example

照著 create VOC dataset 時,卻一直都是 0
export log 來看。一開始有..
libdc1394 error: Failed to initialize libdc1394
2019-11-08 09:03:39 [INFO ] Created features db for stage train_db in train_db/features
2019-11-08 09:03:42 [INFO ] Created labels db for stage train_db in train_db/labels
2019-11-08 09:03:42 [INFO ] Processed 1/46
2019-11-08 09:03:42 [INFO ] Processed 2/46
2019-11-08 09:03:42 [INFO ] Processed 3/46
然後找到..這一篇
# ln /dev/null /dev/raw1394
log 就沒有出現 error了,但是 頁面顯示的還是 size 0.
切換到 datasets tag,點 VOC 開啟頁面,就不是 0 了,follow intruction export db,可以看到image 了...

接著 run script 產生 pretrained parameter (caffemodel).
customize network (剪下-貼上)
train 時出現 error:
ERROR: Cannot copy param 0 weights from layer 'fc6'; 
shape mismatch. Source param shape is 1 1 4096 9216 (37748736); 
target param shape is 4096 256 6 6 (37748736). 
To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.
google 的答案,指向 stackoverflow:
Rename the layer from "loss1/classifier" to "loss1/classifier_retrain".

When fine-tuning a model, here's what Caffe does:

# pseudo-code
for layer in new_model:
  if layer.name in old_model:
    new_model.layer.weights = old_model.layer.weights

You're getting an error because the weights for "loss1/classifier" were for a 1000-class classification problem (1000x1024), 
and you're trying to copy them into a layer for a 6-class classification problem (6x1024). 
When you rename the layer, Caffe doesn't try to copy the weights for that layer and you get randomly initialized weights - which is what you want.
結果完全不是這麼回事...
純粹是因為在把alexnet pretrained model 轉為 fc-alexnet pretrained model 時,要 run net_surgery.sh
script 會去 download bvlc_alexnet.XX,然後要參考 fcn_alexnet.deploy.prototxt
這個 fcn_alexnet.deploy.prototxt 在 DIGITS 的 example/sematic-segmentation 中,所以要 clone 下來,copy 到 run net_surgery.sh 的目錄。
這樣才會正確產生 fcn_alexnet.caffemodel


在 github 的docker 版本只有到 16 版,所以example 中的 GAN model 沒有 install
參考nvidia deep learning digits documentation 中的 repo,有道 19.0 版。

依照說明..
docker pull nvcr.io/nvidia/digits:19.10-tensorflow
然後follow 上面的 啟動..
~$ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility -itd -p 5000:5000 -v ~/dockerfolder:/dockerfolder nvcr.io/nvidia/digits:19.10-tensorflow
然後開啟 browser,port 5000,進入 digits web 話面,右邊 image new 就有 GAN 可以選了

2019/11/7

ffmpeg cuda -- scale

參考ffmpeg HWAccelintro 有關使用 scale_cuda 做 resize..
NVDEC/CUVID

NVDEC offers decoders for H.264, HEVC, MJPEG, MPEG-1/2/4, VP8/VP9, VC-1. Codec support varies by hardware (see the ​GPU compatibility table).

Note that FFmpeg offers both NVDEC and CUVID hwaccels. They differ in how frames are decoded and forwarded in memory.

The full set of codecs being available only on Pascal hardware, which adds VP9 and 10 bit support. The note about missing ffnvcodec from NVENC applies for NVDEC as well.

Sample decode using NVDEC:
ffmpeg -hwaccel nvdec input output
Sample decode using CUVID:
./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -i input output
Full hardware transcode with CUVID and NVENC:
ffmpeg -hwaccel cuvid -c:v h264_cuvid -i input -c:v h264_nvenc -preset slow output
If ffmpeg was compiled with support for libnpp, it can be used to insert a GPU based scaler into the chain:
ffmpeg -hwaccel_device 0 -hwaccel cuvid -i input -vf scale_npp=-1:720 -c:v h264_nvenc -preset slow output.mkv
The -hwaccel_device option can be used to specify the GPU to be used by the hwaccel in ffmpeg.

實際上..
ffmpeg -hwaccel cuvid -c:v h264_cuvid -i mvideo.mp4 -vf scale_npp=2048:-1 -c:v h264_nvenc t1.mp4
前面用 -c:v h264_cuvid,後面 output filename 之前要加上 -c:v h264_nvenc 才行。
不然會有[FFmpeg-user] Error:Impossible to convert between the formats supported by the filter.. 的 Error。
使用 cuda 的 converting rate 是 2.5X.
但是不用 cuda,用cpu 的話,會是 0.8X

ffmpeg with cuda support -- nvenc API version not match

依照前的方法, build ffmpeg w cuda support,結果測試 transcoding 出現 error:
Driver does not support the required nvenc API version. Required: 9.1 Found: 8.1
參考這一篇 的內容,是 header 的關係..

到 nv-codec-header 看一下 tag 和 branch。
checkout sdk/8.1。
make uninstall 移除上次的.
make && sudo make install

然後重新 configure, make ffmpeg,
之後就 OK 了。


有些古老的文章提到 nvresize,這個要上 nvidia patch 的 hw 加速 scale 的功能。後來都找不到了。
這一篇 好像說明了,由 ...

1. Hardware-accelerated encoders: In the case of NVIDIA, NVENC is supported and implemented via the h264_nvenc and the hevc_nvenc wrappers. See this answer on how to tune them, and any limitations you may run into depending on the generation of hardware you're on.

2. Hardware-accelerated filters: Filters that perform duties such as scaling and post-processing (deinterlacing, etc) are available in FFmpeg, and some implementations are hardware-accelerated. For NVIDIA, the following filters can take advantage of hardware-acceleration:

(a). scale_cuda: This is a scaling filter analogous to the generic scale filter, implemented in CUDA. It's dependency is the ffnvcodec project, headers needed to also enable the NVENC-based encoders. When the ffnvcodec headers are present, the respective filters dependent on it (scale_cuda and yadif_cuda) will be automatically enabled. In production, it may be wise to deprecate this filter in favor of scale_npp as it has a very limited set of options.

(b). scale_npp: This is a scaling filter implemented in NVIDIA's Performance Primitives. It's primary dependency is the CUDA SDK, and it must be explicitly enabled by passing --enable-libnpp, --enable-cuda-nvcc and --enable-nonfree flags to ./configure at compile time when building FFmpeg from source. Use this filter in place of scale_cuda wherever possible.

(c). yadif_cuda: This is a deinterlacer, implemented in CUDA. It's dependency, as stated above, is the ffnvcodec package of headers.

(d). All OpenCL-based filters: All NVENC-capable GPUs supported by both the mainline NVIDIA driver and the CUDA SDK implement OpenCL support. I started this section with this clarification because there's news in the wind that NVIDIA will be deprecating mobile Kepler GPUs in their mainline driver, relegating them to Legacy support status. For this reason, if you're on such a platform, take this into consideration.

To enable these filters, pass --enable-opencl to FFmpeg's ./configure script at build time. Note that this requires the OpenCL headers to be present on your system, and can be safely satisfied by your package manager on whatever Linux distribution you're on. On other operating systems, your mileage may vary.

To see all OpenCL-based filters, run:
ffmpeg -h filters | grep opencl
A few notable examples being unsharp_opencl,avgblur_opencl, etc. See this wiki section for more options.
A note pertaining to performance with OpenCL filters: Please take into account any overheads that mechanisms introduced by filter chains such as hwupload and hwdownload may introduce into your pipeline, as uploading textures to and from system memory and the accelerator in question will affect performance, and so will format conversion operations (via the format filter) where needed/required. In this case, it may be beneficial to take advantage of the hwmap filter, and deriving contexts where applicable. For instance, VAAPI has a mechanism that allows for OpenCL device derivation and reverse mapping via hwmap, if the cl_intel_va_api_media_sharing OpenCL extension is present. This is typically provided by the Beignet ICD, and is absent in others, such as the newer Neo OpenCL driver.

3. Hardware-accelerated decoders (and their associated wrappers): Depending on your input source, and the capabilities of your NVIDIA GPU, based on generation, you may also tap into hardware accelerations based on either CUVID or NVDEC. These methods differ in how they handle textures in-flight on the accelerator, and it is wise to evaluate other factors, such as VRAM utilization, when they are in use. Typically, you can take advantage of the CUVID-based hwaccels for operations such as deinterlacing, if so desired. See their usage via:
ffmpeg -h decoder=h264_cuvid
ffmpeg -h decoder=hevc_cuvid
ffmpeg -h decoder=mpeg2_cuvid
However, beware that handling MBAFF encoded content with these decoders, where double deinterlacing is required, is not advisable as NVIDIA has not yet implemented MBAFF support in the backend. Take a look at this thread for more on the same.

In closing: It is wise to evaluate where and when hardware accelerated offloading (filtering, encoding and decoding) offers an advantage or an acceptable trade-off (in quality, feature support and reliability) in your pipeline prior to deployment in production. This is a vendor-neutral approach when deciding what and when to offload parts of your pipeline, and the same applies to NVIDIA's solutions.

For more information, refer to the hardware acceleration entry in FFmpeg's wiki.

Samples demonstrating the use of hardware-accelerated filtering, encoding and decoding based on the notes above:

1. Demonstrate the use of 1:N encoding with NVENC:

The following assumption is made: The test-bed only has one NVENC-capable GPU present, a simple GTX 1070. For this reason I'm limited to two simultaneous NVENC sessions, and that is taken into account with the snippets below. Be warned that cases needing to utilize multiple NVENC-capable GPUs will need the command line(s) modified as appropriate.

My sample files are in ~/Desktop/src

I'll be working with a sample file as shown below:
ffprobe -i deint-testfile.mkv -show_format -hide_banner -show_streams

2019/11/4

cuda docker images .. again , for develop nvcc command.

上次做的,發現nvidia/cuda10.1-base,進入後,雖然 nvidia-smi 是 OK 的,但是沒有 nvcc command。
自己安裝 Toolkit 後,build ffmpeg OK,run 起來 libavcodec.so load fail,說 nvidia driver version 不對 (其實是對的)。

cuda container tags list 中,有列出所有 image 的內容。
每個 link 可以看到 Dockerfile。
猜是 nvidia/cuda10.1-devel 這格才有 nvcc。
測試一下..
docker run -gpus all -it nvidia/cuda:10.1-devel bash
..
:/# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
到 /usr/local 下也有看到 cuda-10.1 目錄。

結果一樣,找不到 libnvidia-encode.so

在 host 找,是在 /usr/lib/:
:/usr/lib$ sudo find . -type f -name 'libnvidia-en*'
./i386-linux-gnu/libnvidia-encode.so.435.21
./x86_64-linux-gnu/libnvidia-encode.so.435.21

可能要參考這一篇,加上 CAPIBILITIES option
-- 沒用,那是舊版nvidia-docker。

這一個 dockerfile 大概可以試試看..

最後,在 nvidia Docker 自己的說明 : Usage
For a Dockerfile using the NVIDIA Video Codec SDK, you should use:

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,video,utility
所以,..遵照說明,把 cuda10.1-base 的Dockerfile download 下來。
在 NVIDIA_DRIVER_CAPABILITIES 加上 video
放在 folder 中,在 folder 中下..
docker build -t cudaffmpeg .
用 docker images 看看 cudaffmpeg 有沒有 create 出來..
結果 libnvidia-encode.so 是有了,但是沒有 nvcc...

應該要參考devel 的 Dockerfile..
所以合併兩個 Dockerfile..
.. 看這幾個Tag的Dockerfile 關係(FROM):
base -- runtime -- devel
所以直接拿 devel 來就可以。再用 -e 來變更 dockfile 的 ENV NVIDIA_DRIVER_CAPABILITIES:
docker run -it --gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility nvidia/cuda:10.1-devel bash
測試 nvcc, nvidia-smi. libnvidia-encode.so 都在。

參考前面兩篇,用這個docker image 啟動後, build ffmpeg,使用 nvidia hw 加速 OK.

2019/11/1

worklog -- prepare docker container for ffmpeg cuda

用 docker 測試 ffmpeg cuda ..
docker
  • 以daemon 方式執行,使用 ssh 登入
  • ssh login uid/gid 要與 host 端一致
  • 使用 host folder
  • 要 support cuda
要 support cuda,所以用 nvidia 的 docker image。
而且啟動的時候要加上 "--gpu all"
docker run -idt --gpus all -v ~/dockerfolder:/dockerfolder -p 8022:22 nvidia/cuda:10.1-base
docker exec -it container-name bash
進入後...
apt-get update
apt-get install openssh-server vim sudo
/etc/init.d/ssh start
addgroup --gid 1001 myname
adduser --uid 1001 --gid 1001 myname
vi /etc/group  -- add myname to sudo and root group
這樣,已經可以 ssh -p 8022 hostname 登入了..

安裝需要的 compiler 和 tool
sudo apt-get install build-essential pkgconfig

要 install cuda sdk (Toolkit)
chmod a+x cuda_10.1.243_418.87.00_linux.run
sudo ./cuda_10.1.243_418.87.00_linux.run
...
然後依照說明,把 bin 和 so path 加入..
/etc/profil.d/cuda101.sh
export PATH=$PATH:/usr/local/cuda-10.1/bin
/etc/ld.so.conf.d/cuda-10.1.conf
/usr/local/cuda-10.1/target/x86_64-linux/lib
/usr/local/cuda-10.1/lib64
改完 ld.so.conf.d, sudo run 一次 ldconfig 來 update ld cache.
至於 path. 如果不重新開機,就要自己 source 一次。

follow nvidia ffmpeg,因為 ffmpeg 已經把 nvidia 要的 header 分離開來,所以要到 nvidia 下載:
git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
cd nv-codec-headers
make
sudo make install

開始 download and build ffmpeg...
git clone https://git.ffmpeg.org/ffmpeg.git
cd ffmpeg
git checkout n4.2.1
git checkout -b 4.2.1

./configure --prefix=/dockerfolder/cudaffmpeg --enable-cuda-nvcc --enable-cuvid --enable-nvenc
 --enable-nonfree --enable-libnpp --extra-cflags=-I/usr/local/cuda/include
 --extra-ldflags=-L/usr/local/cuda/lib64
make -j10
make install
會安裝到 /dockerfolder/cudaffmpeg 下,
所以還要增加 share library path
export LD_LIBRARY_PATH=/dockerfolder/cudaffmpeg/lib
然後就可以測試...
:/dockerfolder/cudaffmpeg/bin$ ./ffmpeg -version
ffmpeg version N-95607-gb414cff630 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
configuration: --prefix=/dockerfolder/cudaffmpeg --enable-cuda-nvcc --enable-cuvid --enable-nvenc
 --enable-nonfree --enable-libnpp --extra-cflags=-I/usr/local/cuda/include
 --extra-ldflags=-L/usr/local/cuda/lib64
libavutil      56. 35.101 / 56. 35.101
libavcodec     58. 60.100 / 58. 60.100
libavformat    58. 33.100 / 58. 33.100
libavdevice    58.  9.100 / 58.  9.100
libavfilter     7. 66.100 /  7. 66.100
libswscale      5.  6.100 /  5.  6.100
libswresample   3.  6.100 /  3.  6.100

測試..20sec 的 mp4 轉 h264
time ~/cudaffmpeg/bin/ffmpeg -i testvideo.mp4 -an -vcodec h264_nvenc  testvideo.h264
...
real    0m4.157s
user    0m14.736s
sys     0m0.214s

time ~/cudaffmpeg/bin/ffmpeg -i testvideo.mp4 -an -vcodec libx264  testvideo0.h264
...
real    0m24.757s
user    2m19.679s
sys     0m2.956s
好像快了 10 倍...

一些其他的 ffmpeg, 一般的options..
scale:
ffmpeg -i testvideo.h264 -vf scale=320:240 -vcodec h264_nvenc test320.h264

這個docker container 執行 ffmpeg hvenc 失敗,找不到 libnvidia-encode.so。要用這一篇 的方法加上 capability 參數,enable video 才行。