1. Install CUDA Toolkit Link to heading
The CUDA Toolkit includes: CUDA, cuDNN, TensorRT, and more.
Download the CUDA Toolkit Download link: CUDA Toolkit Archive
Set up Environment Variables
export PATH=/usr/local/cuda-12.2/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH
Test the installation
nvcc --version # Display CUDA version
2. Install Drivers Link to heading
Check if the system detects the NVIDIA GPU:
lspci | grep -i nvidia
Use
ubuntu-drivers
to check for the recommended NVIDIA driver version:ubuntu-drivers devices
Install the recommended driver: To let the system automatically install the recommended NVIDIA driver:
sudo ubuntu-drivers autoinstall
If you need to install a specific driver version manually:
sudo apt install nvidia-driver-535 # The recommended version on my system is 535
After installation, reboot the system:
sudo reboot
Check if the NVIDIA driver is loaded:
nvidia-smi # The output should match Step 3 from the first part
At this point, the installation should be complete.
Troubleshooting Link to heading
If there’s no output or you see an error such as:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Run the following command to check if the driver is properly installed:
dpkg -l | grep nvidia
Look for an entry like
nvidia-driver-535
.Check if the NVIDIA module is loaded:
lsmod | grep nvidia
If there’s no output, try manually loading it:
sudo modprobe nvidia
If you encounter this error:
bash
modprobe: ERROR: could not insert 'nvidia': Operation not permitted
This is likely due to Secure Boot being enabled.
Check if Secure Boot is enabled:
mokutil --sb-state
If it shows
SecureBoot enabled
, you need to disable it, as Secure Boot prevents unsigned drivers from loading.Disabling Secure Boot:
Method 1: BIOS Settings
- Restart the computer and enter the BIOS/UEFI settings.
- Find the Secure Boot option and set it to Disabled.
- Save the changes and exit the BIOS.
Method 2: MOK Settings
- Run the following command to disable Secure Boot or register the key:
sudo mokutil --disable-validation
- Set a password when prompted.
- Reboot the system:
sudo reboot
- Enter the MOK management interface:
- “Continue Boot” to proceed with normal startup.
- “Enroll MOK” to register keys (if you selected to import keys).
- “Disable Secure Boot” (if you ran
mokutil --disable-validation
). - “Change Password” to change the password.
- Finally, check if the NVIDIA driver is properly loaded:
nvidia-smi
3. Nvidia-container-toolkit Link to heading
This toolkit helps users access/build/run GPU-accelerated applications in containerized environments (like Docker). It includes a runtime library and associated utilities that automatically configure containers to leverage NVIDIA GPUs for efficient GPU acceleration in containerized applications.
Installation
Check if
nvidia-container-toolkit
is installed:dpkg -l | grep nvidia-container-toolkit
Install the toolkit:
sudo apt-get install -y nvidia-container-toolkit
If you encounter the error:
E: Unable to locate package nvidia-container-toolkit
Follow the steps below.
Add NVIDIA GPG key:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
Add the NVIDIA container toolkit repository:
distribution=$(. /etc/os-release; echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
Update the package list:
sudo apt-get update
Install
nvidia-container-toolkit
:sudo apt-get install -y nvidia-container-toolkit
Restart Docker service:
sudo systemctl restart docker
Usage
Start a container with GPU support: When running a Docker container, use the
--gpus all
flag to enable GPU support, and the-v
flag to mount the host system’s CUDA directory to the container. For example, if the host’s CUDA installation path is/usr/local/cuda
, use:docker run --gpus all -it \ -v /usr/local/cuda:/usr/local/cuda \ your_docker_image
Set up environment variables inside the container to ensure it can find
nvcc
:export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Test CUDA:
nvcc --version
4. GPU Power Settings Link to heading
To limit power usage:
nvidia-smi -i 0 -pl 100 # -i 0 for the first GPU, -pl 100 to limit power to 100W
To restore power limits:
nvidia-smi -i 0 -pl 160