Skip to main content

Install NVIDIA GPU driver on Ubuntu

GitHub

info

In this tutorial, the OS can be supported on Ubuntu 20.04 & 18.04.

In addition, the tutorial can support not only for installation but also for upgrading the version.

Preparing

Check nouveau (Important!)

caution

Check nouveau whether it is working or not. If nouveau is working now, then you need to turn off. Otherwise, after you install the NVIDIA gpu driver, it will encoutner some issues.

Command:

lsmod | grep nouveau

If response is nothing, then it is not working.

If it prints something, then you can follow this instruction Turn off Nouveau to turn off it.

Check kernal version

It will find out GNU gcc compiler version used to compile running kernel.

cat /proc/version

Check gcc version:

gcc --version

If the version is differnet with compiled version, it has better change gcc version to the compiled gcc version of running kernel.

Check the GPU configuration

sudo lshw -C display

You might wanna download the gpu driver from website. If you do not want to install via the commands.


Let's start

Unisntall GPU driver

(If you have installed once, and you wanna upgrade the gpu driver.)

sudo apt-get --purge remove "nvidia*"
sudo apt autoremove
sudo apt-get --purge remove "*cublas*" "cuda*"

Install GPU driver

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
ubuntu-drivers devices

Output:

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001D81sv000010DEsd00001218bc03sc00i00
vendor : NVIDIA Corporation
model : GV100 [TITAN V]
driver : nvidia-driver-470-server - distro non-free recommended
driver : nvidia-driver-460-server - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-440 - third-party free
driver : nvidia-driver-460 - third-party free
driver : nvidia-driver-450 - third-party free
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-465 - third-party free
driver : nvidia-driver-455 - third-party free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-418 - third-party free
driver : nvidia-driver-470 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin

I chose 470 version

sudo apt install nvidia-driver-470-server

After we install it, reboot it.

sudo reboot

Check it by nvidia-smi

Sun Oct  3 14:08:17 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN V Off | 00000000:01:00.0 Off | N/A |
| 34% 49C P8 33W / 250W | 163MiB / 12063MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1653 G /usr/lib/xorg/Xorg 63MiB |
| 0 N/A N/A 1965 G /usr/bin/gnome-shell 97MiB |
+-----------------------------------------------------------------------------+

Troubleshooting

  1. The repository xxx bionic Release' no longer has a Release file.

    E: The repository 'https://deb.nodesource.com/node_10.x bionic Release' no longer has a Release file.
    N: Updating from such a repository can't be done securely, and is therefore disabled by default.
    N: See apt-secure(8) manpage for repository creation and user configuration details.

    Solution:

    sudo apt install ca-certificates
  2. Connection fail

    Connection fail

    You can try to do update and upgrade first.

    apt-get update
    apt-get upgrade --fix-missing
    apt update
    apt upgrade
  3. MOK Issue

    Sometimes you will encounter this situation that it asks you for typing a password for MOK management use. Do not worry about it. Just follow the steps.

    After finish the installation, reboot it. You might see a blue screen about perform mok management. Please follow these steps:

    1. Choose "Enroll MOK"
    2. Choose "continue"
    3. Go to "enroll the key", and choose "Yes"
    4. Type the password of what you typed during the installation.
    5. Reboot.

    Done.


Install Cuda

➤ CUDA version : 11.4 (This was for ubuntu 18.04 use.)

Steps:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda-repo-ubuntu1804-11-4-local_11.4.2-470.57.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-4-local_11.4.2-470.57.02-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

Then reboot it. Add the cuda folder into bashrc or zshrc.

➤ CUDA version : 11.6 (This was for ubuntu 20.04 use.)

Steps:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

Then reboot it. Add the cuda folder into bashrc or zshrc.

For example:

# CUDA:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.6/lib64
export CUDA_INSTALL_DIR=/usr/local/cuda-11.6
export PATH=$PATH:/usr/local/cuda-11.6/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.6

Install cudnn

Check here: https://github.com/chiehpower/Setup-deeplearning-tools#install-cudnn

  1. Copy the include folder files into /usr/local/cuda/include
  2. Copy the lib64 folder files into /usr/local/cuda/lib64

Install TensorRT

tar -zxvf TensorRT-8.2.0.6.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar
cd TensorRT-8.2.0.6
python3 -m pip install python/tensorrt-8.2.0.6-cp36-none-linux_x86_64.whl
python3 -m pip install uff/uff-0.6.9-py2.py3-none-any.whl
python3 -m pip install graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
python3 -m pip install onnx_graphsurgeon/onnx_graphsurgeon-0.3.12-py2.py3-none-any.whl

Put this line in your bashrc or zshrc.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(your path)/TensorRT-8.2.0.6/lib

Docker Part

Troubleshooting

  1. very important!!

    Container used the gpu, encounter error.

    $ docker run --gpus all nvidia/cuda:11.0-base nvidia-smi                 
    docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

    Solution:

    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-container-runtime
    sudo systemctl restart docker
  1. Update /etc/apt/source.list

    For AMD64 use

    sed -i -e 's/archive.ubuntu.com/free.nchc.org.tw/' /etc/apt/sources.list
  1. After we add the ppa ppa:graphics-drivers/ppa, still cannot find the tool ubuntu-drivers devices

    Try to install this:

    sudo apt install ubuntu-drivers-common
  2. Unable to locate package nvidia-container-toolkit

    Check here: https://nvidia.github.io/nvidia-docker/

    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
    sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
    sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update
  3. add-apt-repository: command not found

    Solution:

    sudo apt install software-properties-common
  4. Some GPUs cannot be detect by lspci

    In my case, I have a RTX3060 that it cannot be detected by lspci, then we need to install one package and reboot it. Then it can work. ref

    sudo update-pciids
    sudo reboot