Install NVIDIA GPU driver on Ubuntu
In this tutorial, the OS can be supported on Ubuntu 20.04
& 18.04
.
In addition, the tutorial can support not only for installation but also for upgrading the version.
Preparing
Check nouveau (Important!)
Check nouveau whether it is working or not. If nouveau is working now, then you need to turn off. Otherwise, after you install the NVIDIA gpu driver, it will encoutner some issues.
Command:
lsmod | grep nouveau
If response is nothing, then it is not working.
If it prints something, then you can follow this instruction Turn off Nouveau to turn off it.
Check kernal version
It will find out GNU gcc compiler version used to compile running kernel.
cat /proc/version
Check gcc version:
gcc --version
If the version is differnet with compiled version, it has better change gcc version to the compiled gcc version of running kernel.
Check the GPU configuration
sudo lshw -C display
You might wanna download the gpu driver from website. If you do not want to install via the commands.
Let's start
Unisntall GPU driver
(If you have installed once, and you wanna upgrade the gpu driver.)
sudo apt-get --purge remove "nvidia*"
sudo apt autoremove
sudo apt-get --purge remove "*cublas*" "cuda*"
Install GPU driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
ubuntu-drivers devices
Output:
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001D81sv000010DEsd00001218bc03sc00i00
vendor : NVIDIA Corporation
model : GV100 [TITAN V]
driver : nvidia-driver-470-server - distro non-free recommended
driver : nvidia-driver-460-server - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-440 - third-party free
driver : nvidia-driver-460 - third-party free
driver : nvidia-driver-450 - third-party free
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-465 - third-party free
driver : nvidia-driver-455 - third-party free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-418 - third-party free
driver : nvidia-driver-470 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin
I chose 470 version
sudo apt install nvidia-driver-470-server
After we install it, reboot it.
sudo reboot
Check it by nvidia-smi
Sun Oct 3 14:08:17 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN V Off | 00000000:01:00.0 Off | N/A |
| 34% 49C P8 33W / 250W | 163MiB / 12063MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1653 G /usr/lib/xorg/Xorg 63MiB |
| 0 N/A N/A 1965 G /usr/bin/gnome-shell 97MiB |
+-----------------------------------------------------------------------------+
Troubleshooting
The repository xxx bionic Release' no longer has a Release file.
E: The repository 'https://deb.nodesource.com/node_10.x bionic Release' no longer has a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.Solution:
sudo apt install ca-certificates
Connection fail
Connection fail
You can try to do update and upgrade first.
apt-get update
apt-get upgrade --fix-missing
apt update
apt upgradeMOK Issue
Sometimes you will encounter this situation that it asks you for typing a password for MOK management use. Do not worry about it. Just follow the steps.
After finish the installation, reboot it. You might see a blue screen about perform mok management. Please follow these steps:
- Choose "Enroll MOK"
- Choose "continue"
- Go to "enroll the key", and choose "Yes"
- Type the password of what you typed during the installation.
- Reboot.
Done.
Install Cuda
➤ CUDA version : 11.4 (This was for ubuntu 18.04 use.)
Steps:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda-repo-ubuntu1804-11-4-local_11.4.2-470.57.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-4-local_11.4.2-470.57.02-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
Then reboot it. Add the cuda folder into bashrc or zshrc.
➤ CUDA version : 11.6 (This was for ubuntu 20.04 use.)
Steps:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
Then reboot it. Add the cuda folder into bashrc or zshrc.
For example:
# CUDA:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.6/lib64
export CUDA_INSTALL_DIR=/usr/local/cuda-11.6
export PATH=$PATH:/usr/local/cuda-11.6/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.6
Install cudnn
Check here: https://github.com/chiehpower/Setup-deeplearning-tools#install-cudnn
- Copy the include folder files into /usr/local/cuda/include
- Copy the lib64 folder files into /usr/local/cuda/lib64
Install TensorRT
tar -zxvf TensorRT-8.2.0.6.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar
cd TensorRT-8.2.0.6
python3 -m pip install python/tensorrt-8.2.0.6-cp36-none-linux_x86_64.whl
python3 -m pip install uff/uff-0.6.9-py2.py3-none-any.whl
python3 -m pip install graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
python3 -m pip install onnx_graphsurgeon/onnx_graphsurgeon-0.3.12-py2.py3-none-any.whl
Put this line in your bashrc or zshrc.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(your path)/TensorRT-8.2.0.6/lib
Docker Part
Troubleshooting
very important!!
Container used the gpu, encounter error.
$ docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].Solution:
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-container-runtime
sudo systemctl restart docker
Update /etc/apt/source.list
For AMD64 use
sed -i -e 's/archive.ubuntu.com/free.nchc.org.tw/' /etc/apt/sources.list
After we add the ppa
ppa:graphics-drivers/ppa
, still cannot find the toolubuntu-drivers devices
Try to install this:
sudo apt install ubuntu-drivers-common
Unable to locate package nvidia-container-toolkit
Check here: https://nvidia.github.io/nvidia-docker/
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get updateadd-apt-repository: command not found
Solution:
sudo apt install software-properties-common
Some GPUs cannot be detect by
lspci
In my case, I have a RTX3060 that it cannot be detected by
lspci
, then we need to install one package and reboot it. Then it can work. refsudo update-pciids
sudo reboot