2021/2/8

Failed to initialize NVML: Driver/library version mismatch

run nvidia-smi 時出現 Error message: Failed to initialize NVML: Driver/library version mismatch

因為 更新後,driver 版本跟 nvidia 版本不一致,要重新load nvidia driver,一般重新開機就可以。
不然就要手動reload nvidia driver

因為 driver 都又相依行,所以要照相依順序一一rmmod..
最後目標是nvidia,都remove 後,再run 一次nvidia-smi,就會自動load 正確的driver

charles-chang@penguin1:~$ sudo rmmod nvidia_drm
charles-chang@penguin1:~$ sudo rmmod nvidia
rmmod: ERROR: Module nvidia is in use by: nvidia_uvm nvidia_modeset
charles-chang@penguin1:~$ sudo rmmod nvidia_uvm
charles-chang@penguin1:~$ sudo rmmod nvidia_modeset
charles-chang@penguin1:~$ sudo rmmod nvidia
charles-chang@penguin1:~$ sudo nvidia-smi
Mon Feb  8 23:30:38 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:01:00.0 Off |                  N/A |
| 24%   42C    P0    30W / 280W |      0MiB / 24218MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
charles-chang@penguin1:~$ lsmod | grep nvidia
nvidia_uvm            983040  0
nvidia_drm             49152  0
nvidia_modeset       1179648  1 nvidia_drm
nvidia              19701760  2 nvidia_uvm,nvidia_modeset
drm_kms_helper        172032  1 nvidia_drm
drm                   401408  3 drm_kms_helper,nvidia_drm

沒有留言:

張貼留言