Install NVIDIA GPU driver on Ubuntu

info

In this tutorial, the OS can be supported on Ubuntu 20.04 & 18.04.

In addition, the tutorial can support not only for installation but also for upgrading the version.

Preparing

Check nouveau (Important!)

caution

Check nouveau whether it is working or not. If nouveau is working now, then you need to turn off. Otherwise, after you install the NVIDIA gpu driver, it will encoutner some issues.

Command:

lsmod | grep nouveau

If response is nothing, then it is not working.

If it prints something, then you can follow this instruction Turn off Nouveau to turn off it.

Check kernal version

It will find out GNU gcc compiler version used to compile running kernel.

cat /proc/version

Check gcc version:

gcc --version

If the version is differnet with compiled version, it has better change gcc version to the compiled gcc version of running kernel.

Check the GPU configuration

sudo lshw -C display

You might wanna download the gpu driver from website. If you do not want to install via the commands.

Let's start

Unisntall GPU driver

(If you have installed once, and you wanna upgrade the gpu driver.)

sudo apt-get --purge remove "nvidia*"
sudo apt autoremove
sudo apt-get --purge remove "*cublas*" "cuda*"

Install GPU driver

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
ubuntu-drivers devices

Output:

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001D81sv000010DEsd00001218bc03sc00i00
vendor   : NVIDIA Corporation
model    : GV100 [TITAN V]
driver   : nvidia-driver-470-server - distro non-free recommended
driver   : nvidia-driver-460-server - distro non-free
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-440 - third-party free
driver   : nvidia-driver-460 - third-party free
driver   : nvidia-driver-450 - third-party free
driver   : nvidia-driver-410 - third-party free
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-465 - third-party free
driver   : nvidia-driver-455 - third-party free
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-418 - third-party free
driver   : nvidia-driver-470 - third-party free
driver   : xserver-xorg-video-nouveau - distro free builtin

I chose 470 version

sudo apt install nvidia-driver-470-server

After we install it, reboot it.

sudo reboot

Check it by nvidia-smi

Sun Oct  3 14:08:17 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN V      Off  | 00000000:01:00.0 Off |                  N/A |
| 34%   49C    P8    33W / 250W |    163MiB / 12063MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1653      G   /usr/lib/xorg/Xorg                 63MiB |
|    0   N/A  N/A      1965      G   /usr/bin/gnome-shell               97MiB |
+-----------------------------------------------------------------------------+

Troubleshooting

The repository xxx bionic Release' no longer has a Release file.

E: The repository 'https://deb.nodesource.com/node_10.x bionic Release' no longer has a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

Solution:

sudo apt install ca-certificates

Connection fail

Connection fail

You can try to do update and upgrade first.

apt-get update
apt-get upgrade --fix-missing
apt update
apt upgrade

MOK Issue
Sometimes you will encounter this situation that it asks you for typing a password for MOK management use. Do not worry about it. Just follow the steps.
After finish the installation, reboot it. You might see a blue screen about perform mok management. Please follow these steps:
1. Choose "Enroll MOK"
2. Choose "continue"
3. Go to "enroll the key", and choose "Yes"
4. Type the password of what you typed during the installation.
5. Reboot.
Done.

Install Cuda

➤ CUDA version : 11.4 (This was for ubuntu 18.04 use.)

Steps:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda-repo-ubuntu1804-11-4-local_11.4.2-470.57.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-4-local_11.4.2-470.57.02-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

Then reboot it. Add the cuda folder into bashrc or zshrc.

➤ CUDA version : 11.6 (This was for ubuntu 20.04 use.)

Steps:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

Then reboot it. Add the cuda folder into bashrc or zshrc.

For example:

# CUDA:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.6/lib64
export CUDA_INSTALL_DIR=/usr/local/cuda-11.6
export PATH=$PATH:/usr/local/cuda-11.6/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.6

Install cudnn

Check here: https://github.com/chiehpower/Setup-deeplearning-tools#install-cudnn

Copy the include folder files into /usr/local/cuda/include
Copy the lib64 folder files into /usr/local/cuda/lib64

Install TensorRT

tar -zxvf TensorRT-8.2.0.6.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar
cd TensorRT-8.2.0.6
python3 -m pip install python/tensorrt-8.2.0.6-cp36-none-linux_x86_64.whl
python3 -m pip install uff/uff-0.6.9-py2.py3-none-any.whl
python3 -m pip install graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
python3 -m pip install onnx_graphsurgeon/onnx_graphsurgeon-0.3.12-py2.py3-none-any.whl

Put this line in your bashrc or zshrc.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(your path)/TensorRT-8.2.0.6/lib

Docker Part

Troubleshooting

very important!!

Container used the gpu, encounter error.

$ docker run --gpus all nvidia/cuda:11.0-base nvidia-smi                 
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Solution:

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-container-runtime
sudo systemctl restart docker

Update /etc/apt/source.list

For AMD64 use

sed -i -e 's/archive.ubuntu.com/free.nchc.org.tw/' /etc/apt/sources.list

After we add the ppa ppa:graphics-drivers/ppa, still cannot find the tool ubuntu-drivers devices
Try to install this:
```
sudo apt install ubuntu-drivers-common
```

Unable to locate package nvidia-container-toolkit

Check here: https://nvidia.github.io/nvidia-docker/

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

add-apt-repository: command not found
Solution:
```
sudo apt install software-properties-common
```
Some GPUs cannot be detect by lspci
In my case, I have a RTX3060 that it cannot be detected by lspci, then we need to install one package and reboot it. Then it can work. ref
```
sudo update-pciids
sudo reboot
```

Preparing​

Check nouveau (Important!)​

Check kernal version​

Check the GPU configuration​

Let's start​

Unisntall GPU driver​

Install GPU driver​

Troubleshooting​

Install Cuda​

➤ CUDA version : 11.4 (This was for ubuntu 18.04 use.)​

➤ CUDA version : 11.6 (This was for ubuntu 20.04 use.)​

Install cudnn​

Install TensorRT​

Docker Part​

Troubleshooting​

Preparing

Check nouveau (Important!)

Check kernal version

Check the GPU configuration

Let's start

Unisntall GPU driver

Install GPU driver

Troubleshooting

Install Cuda

➤ CUDA version : 11.4 (This was for ubuntu 18.04 use.)

➤ CUDA version : 11.6 (This was for ubuntu 20.04 use.)

Install cudnn

Install TensorRT

Docker Part

Troubleshooting