Remove all cuda lib and re-install method
Due to frequent updates, CUDA may stop working if you are not careful when updating your system.
Here are the steps to resolve such situations.
Uncheck nvidia repository
sudo apt update
Remove cuda files
sudo rm -rf /usr/local/cuda*
sudo apt update && sudo apt upgrade -y
Make sure the driver is the Nouveau display driver.
Delete docker images
Check the current docker image.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 7015867e0ff7 2 weeks ago 18.2GB
tokaikaoninsho/face01_no_gpu 1.4.12 dd34d05422c5 2 weeks ago 2.53GB
tokaikaoninsho/face01_gpu 1.4.12 61d32d36b9ab 2 weeks ago 19.2GB
tokaikaoninsho/face01_no_gpu 1.4.11 6ad4ba3cbe88 3 weeks ago 3.65GB
tokaikaoninsho/face01_gpu 1.4.11 682da444845a 3 weeks ago 20.3GB
face01_no_gpu 1.4.11 efc3845d390a 3 weeks ago 2.5GB
face01_gpu 1.4.11 7398d955d905 3 weeks ago 19.2GB
<none> <none> 6c61d0364450 3 weeks ago 1.66GB
<none> <none> b2e38e65b233 3 weeks ago 18.2GB
ubuntu 20.04 d5447fc01ae6 2 months ago 72.8MB
tensorflow/tensorflow latest-gpu-jupyter cf6cb74c9ec4 5 months ago 6.19GB
nvidia/cuda 11.0.3-base-ubuntu20.04 8017f5c31b74 6 months ago 122MB
Delete docker images.
$ docker rmi 7015867e0ff7 dd34d05422c5 61d32d36b9ab 6ad4ba3cbe88 682da444845a efc3845d390a 7398d955d905 6c61d0364450 b2e38e65b233 d5447fc01ae6 cf6cb74c9ec4 8017f5c31b74
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> 7015867e0ff7 2 weeks ago 18.2GB
tensorflow/tensorflow latest-gpu-jupyter cf6cb74c9ec4 5 months ago 6.19GB
nvidia/cuda 11.0.3-base-ubuntu20.04 8017f5c31b74 6 months ago 122MB
$ docker rmi -f nvidia/cuda:11.0.3-base-ubuntu20.04
Untagged: nvidia/cuda:11.0.3-base-ubuntu20.04
Untagged: nvidia/cuda@sha256:57455121f3393b7ed9e5a0bc2b046f57ee7187ea9ec562a7d17bf8c97174040d
Remove docker images with -f
option that cannot be removed.
$ docker rmi -f tensorflow/tensorflow:latest-gpu-jupyter
Untagged: tensorflow/tensorflow:latest-gpu-jupyter
Untagged: tensorflow/tensorflow@sha256:a72deb34d32e26cf4253608b0e86ebb4e5079633380c279418afb5a131c499d6
Deleted: sha256:cf6cb74c9ec4ff92634514468a6dd2323dead73720b58e1700b9478557668b3d
$ docker rmi -f 7015867e0ff7
Deleted: sha256:7015867e0ff7461e1776bfa43f7383f1a6ec748817e8afb60b04fce9f2b40cd8
Deleted: sha256:ae77d65add3126995cbfb38f7e8b36e12fa5f23de0ab7a9723b2a752cca3c281
Deleted: sha256:82eb8ba78e6c6d7f349188ba006b3e9f35b003e1682f3820355ab839bd5acd04
Deleted: sha256:f946ae5db3ab83a4da53d8791d7c57e7f6ad39bda37527e0338f82524791578f
Deleted: sha256:43707fb49b26719b6c92faf6af9fb2e160efa3ea9151cdc43c7fb903e61e7
Downloading the Docker public key, then set up it.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
Re-install lib
Tick nvidia repository.
sudo apt update
<!-- sudo apt install -y cuda -->
sudo apt install -y nvidia-cuda-toolkit
sudo apt install -y libcudnn8
sudo apt install -y libcudnn8-dev
<!-- sudo apt install -y libcublas -->
Re-install driver
Check drivers.
sudo ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:03.1/0000:08:00.0 ==
modalias : pci:v000010DEd00002182sv00001462sd00008D90bc03sc00i00
vendor : NVIDIA Corporation
model : TU116 [GeForce GTX 1660 Ti]
driver : nvidia-driver-450 - third-party non-free
driver : nvidia-driver-525-open - distro non-free recommended
driver : nvidia-driver-460 - third-party non-free
driver : nvidia-driver-515 - third-party non-free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-455 - third-party non-free
driver : nvidia-driver-470 - third-party non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-515-open - distro non-free
driver : nvidia-driver-520 - third-party non-free
driver : nvidia-driver-495 - third-party non-free
driver : nvidia-driver-515-server - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-510 - third-party non-free
driver : nvidia-driver-465 - third-party non-free
driver : nvidia-driver-525 - third-party non-free
driver : nvidia-driver-525-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
Re-install driver.
sudo apt install nvidia-driver-515
パッケージリストを読み込んでいます... 完了
依存関係ツリーを作成しています
状態情報を読み取っています... 完了
以下の追加パッケージがインストールされます:
dctrl-tools dkms libegl-mesa0:i386 libegl1:i386 libgbm1:i386 libgles2:i386 libnvidia-cfg1-515
libnvidia-common-515 libnvidia-compute-515 libnvidia-compute-515:i386 libnvidia-decode-515
libnvidia-decode-515:i386 libnvidia-encode-515 libnvidia-encode-515:i386 libnvidia-extra-515
libnvidia-fbc1-515 libnvidia-fbc1-515:i386 libnvidia-gl-515 libnvidia-gl-515:i386 libopengl0:i386
libwayland-server0:i386 nvidia-compute-utils-515 nvidia-dkms-515 nvidia-kernel-common-515
nvidia-kernel-source-515 nvidia-prime nvidia-settings nvidia-utils-515 screen-resolution-extra
xserver-xorg-video-nvidia-515
提案パッケージ:
debtags menu
以下のパッケージは「削除」されます:
libnvidia-compute-418-server
以下のパッケージが新たにインストールされます:
dctrl-tools dkms libegl-mesa0:i386 libegl1:i386 libgbm1:i386 libgles2:i386 libnvidia-cfg1-515
libnvidia-common-515 libnvidia-compute-515 libnvidia-compute-515:i386 libnvidia-decode-515
libnvidia-decode-515:i386 libnvidia-encode-515 libnvidia-encode-515:i386 libnvidia-extra-515
libnvidia-fbc1-515 libnvidia-fbc1-515:i386 libnvidia-gl-515 libnvidia-gl-515:i386 libopengl0:i386
libwayland-server0:i386 nvidia-compute-utils-515 nvidia-dkms-515 nvidia-driver-515 nvidia-kernel-common-515
nvidia-kernel-source-515 nvidia-prime nvidia-settings nvidia-utils-515 screen-resolution-extra
xserver-xorg-video-nvidia-515
アップグレード: 0 個、新規インストール: 31 個、削除: 1 個、保留: 0 個。
334 MB のアーカイブを取得する必要があります。
この操作後に追加で 778 MB のディスク容量が消費されます。
続行しますか? [Y/n] Y
Reload ~/.bashrc
.
. .bashrc
Install Docker
sudo apt-get update && sudo apt-get upgrade -y \
&& curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
sudo usermod -aG docker <user_name>
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt-get update --fix-missing
sudo apt install -y nvidia-docker2
sudo systemctl restart docker
Check with nvidia-smi
command.
nvidia-smi
Sun Feb 12 09:28:15 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:08:00.0 On | N/A |
| 41% 27C P8 11W / 120W | 757MiB / 6144MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1476 G /usr/lib/xorg/Xorg 23MiB |
| 0 N/A N/A 2463 G /usr/lib/xorg/Xorg 143MiB |
| 0 N/A N/A 2632 G /usr/bin/gnome-shell 30MiB |
| 0 N/A N/A 2845 G ...b/thunderbird/thunderbird 90MiB |
| 0 N/A N/A 2854 G /usr/lib/firefox/firefox 235MiB |
| 0 N/A N/A 6510 C .../bin/Lightning/bin/python 159MiB |
| 0 N/A N/A 9838 G ...RendererForSitePerProcess 60MiB |
+-----------------------------------------------------------------------------+
After that...
Reinstall Python libraries related to GPU if necessary. Operation is checked and the work is completed.