2023/9/21

Docker : run nvidia cuda ready images

ref: nvidia 有做好一堆有 support cuda 的 docker image : nvidia in docker image

但是docker 要 supoort 這些 cuda ready 的 image,要安裝 nvidia-container-toolkit
sudo apt install nvidia-container-toolkit
之後設定runtime support:
sudo nvidia-ctk runtime configure --runtime=docker
然後重新啟動 docker daemon:
sudo systemctl restart docker
這樣之後,docker command 就會 support --gpus all 這個 option

另外,image 的 cuda 版本不能比 host 的 cuda 版本新,實際用 torch.rand(2000,128,device=torch.device('cuda')) 測試,他會說使用了新的function.


有一個 dockerfile,是從 yolov5s_android" 看到的:
FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu18.04

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update --fix-missing
RUN apt-get install -y python3 python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/cu110/torch_stable.html

# install openvino
RUN apt-get update && apt-get install -y --no-install-recommends \
    wget \
    cpio \
    sudo \
    lsb-release && \
    rm -rf /var/lib/apt/lists/*
# Add a user that UID:GID will be updated by vscode
ARG USERNAME=developer
ARG GROUPNAME=developer
ARG UID=1000
ARG GID=1000
ARG PASSWORD=developer
RUN groupadd -g $GID $GROUPNAME && \
    useradd -m -s /bin/bash -u $UID -g $GID -G sudo $USERNAME && \
    echo $USERNAME:$PASSWORD | chpasswd && \
    echo "$USERNAME   ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
USER $USERNAME
ENV HOME /home/developer

原文說明是:
git clone --recursive https://github.com/lp6m/yolov5s_android
cd yolov5s_android
docker build ./ -f ./docker/Dockerfile  -t yolov5s_android
docker run -it --gpus all -v `pwd`:/workspace yolov5s_android bash



241122 update

ref

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    
sudo apt update

sudo apt install nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker

sudo systemctl restart docker
測試是否安裝成功,用 nvidia docker 來 run nvidia-smi:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

沒有留言:

張貼留言