Problem: CUDA Drivers Block Your Local AI Setup
You're trying to run LLaMA, Stable Diffusion, or other local AI models on Linux, but you get CUDA driver version is insufficient or no CUDA-capable device detected errors even though your NVIDIA GPU is installed.
You'll learn:
- How to diagnose CUDA driver vs toolkit version conflicts
- Fix the most common driver installation issues on Ubuntu/Debian and Arch-based systems
- Verify your setup works with actual AI workloads
Time: 20 min | Level: Intermediate
Why This Happens
CUDA has three separate version numbers that must align: driver version, toolkit version, and runtime version. Most errors occur when:
- Your driver is older than what PyTorch/TensorFlow expects
- Multiple CUDA toolkits are installed (conda vs system)
- Driver survived a kernel update but needs rebuilding
Common symptoms:
RuntimeError: CUDA driver version is insufficient for CUDA runtime versiontorch.cuda.is_available()returns False- nvidia-smi works but Python can't see GPU
- Driver works after reboot but fails after kernel update
Solution
Step 1: Check What You Actually Have
# GPU detection
lspci | grep -i nvidia
# Driver version (if installed)
nvidia-smi
# CUDA toolkit version (if installed)
nvcc --version
# What PyTorch expects
python3 -c "import torch; print(f'PyTorch CUDA: {torch.version.cuda}')"
Expected output:
nvidia-smishows driver version (e.g., 550.54.15)nvccshows toolkit version (e.g., 12.4)- PyTorch shows required CUDA version
If nvidia-smi fails: Driver isn't installed or kernel module isn't loaded. Continue to Step 2.
If versions mismatch: Your driver is too old for your AI framework. Note the required version and continue.
Step 2: Remove Conflicting Installations
# Ubuntu/Debian - remove old packages
sudo apt remove --purge '^nvidia-.*' '^libnvidia-.*' '^cuda-.*'
sudo apt autoremove
# Arch/Manjaro
sudo pacman -Rns $(pacman -Qq | grep nvidia)
# Remove conda CUDA (if present)
conda list | grep cuda
conda uninstall cudatoolkit cudnn # if found
# Clean module cache
sudo rm -rf /lib/modules/$(uname -r)/kernel/drivers/video
sudo depmod -a
Why this works: Mixing Ubuntu's nvidia packages with CUDA's official repo, or conda's CUDA with system CUDA, creates version conflicts. Clean slate prevents this.
If it fails:
- "Unable to remove": Check for running processes with
lsof | grep nvidiaand kill them - Secure Boot enabled: You'll need to sign kernel modules or disable Secure Boot in BIOS
Step 3: Install Matching Driver
For Ubuntu 22.04/24.04 (recommended for AI workloads):
# Add official NVIDIA repository
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
# Install driver + CUDA 12.4 (latest as of Feb 2026)
sudo apt install cuda-drivers-550 cuda-toolkit-12-4
# Alternatively, just driver (lighter)
sudo apt install nvidia-driver-550
For Arch/Manjaro:
# Latest driver
sudo pacman -S nvidia nvidia-utils cuda
# Or LTS kernel users
sudo pacman -S nvidia-lts nvidia-utils cuda
For other driver versions, check compatibility:
- CUDA 12.4 requires driver ≥ 550.54.15
- CUDA 12.1 requires driver ≥ 530.30.02
- CUDA 11.8 requires driver ≥ 520.61.05
Reboot required after installation:
sudo reboot
Step 4: Verify Driver Loads
# Check module is loaded
lsmod | grep nvidia
# Should see nvidia, nvidia_uvm, nvidia_modeset
# Test GPU detection
nvidia-smi
Expected: nvidia-smi displays your GPU name, driver version, and CUDA version.
If it fails:
- "NVIDIA-SMI has failed": Kernel module didn't load
sudo modprobe nvidia dmesg | grep -i nvidia # Check for errors - Secure Boot issue: Error mentions "Required key not available"
- Option A: Disable Secure Boot in BIOS
- Option B: Sign modules (advanced, see DKMS documentation)
Step 5: Install Python CUDA Runtime
# Create clean environment
python3 -m venv ~/ai-env
source ~/ai-env/bin/activate
# Install PyTorch with CUDA 12.4 support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Verify
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}')"
Expected output:
CUDA available: True
GPU: NVIDIA GeForce RTX 4070
If False:
- Check LD_LIBRARY_PATH:
echo $LD_LIBRARY_PATHshould include/usr/local/cuda/lib64export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
Verification with Real AI Workload
Test with actual model inference:
# Install transformers
pip install transformers accelerate
# Test CUDA with small model
python3 << 'EOF'
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device: {torch.cuda.get_device_name(0)}")
# Load small model
model = AutoModelForCausalLM.from_pretrained(
"microsoft/phi-2",
torch_dtype=torch.float16,
device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
# Quick inference test
inputs = tokenizer("Hello, my name is", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=20)
print(tokenizer.decode(outputs[0]))
print("✅ CUDA working with AI model!")
EOF
You should see: Model downloads, loads to GPU, generates text without errors.
If OOM (out of memory):
- Your GPU works! Just need smaller model or quantization
- Try
torch_dtype=torch.float16or use 4-bit quantization with bitsandbytes
Common Edge Cases
Issue: Works After Reboot, Fails After Kernel Update
Cause: DKMS didn't rebuild module for new kernel.
Fix:
# Rebuild for current kernel
sudo dkms install -m nvidia -v $(modinfo nvidia | grep ^version | awk '{print $2}')
# Or reinstall driver package
sudo apt install --reinstall nvidia-dkms-550
Issue: Multiple GPUs, PyTorch Uses Wrong One
Set default GPU:
export CUDA_VISIBLE_DEVICES=0 # Use first GPU
# Or in Python
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Use second GPU
Issue: "CUDA out of memory" with Small Models
Check what's using VRAM:
nvidia-smi
# Look at "Memory-Usage" column
# Kill process hogging GPU
kill -9 <PID>
Clear PyTorch cache:
import torch
torch.cuda.empty_cache()
What You Learned
- CUDA driver version must meet or exceed toolkit requirements
- Mixing system CUDA, conda CUDA, and pip packages causes conflicts
- nvidia-smi working ≠ PyTorch can use GPU (need matching runtimes)
- Kernel updates can break DKMS drivers if not configured correctly
Limitations:
- Secure Boot requires extra steps (module signing or disabling)
- Laptop Optimus setups need additional configuration
- WSL2 on Windows uses different driver model (not covered here)
Quick Reference
Version Compatibility Matrix (Feb 2026)
| AI Framework | Requires CUDA | Min Driver |
|---|---|---|
| PyTorch 2.5 | 12.4 or 12.1 | 550.54.15 |
| TensorFlow 2.17 | 12.3 | 545.23.08 |
| JAX 0.4.35 | 12.x | 550+ |
| llama.cpp | None (uses CPU) | N/A |
Essential Commands
# Check everything
nvidia-smi # Driver status
nvcc --version # Toolkit version
python -c "import torch; print(torch.cuda.is_available())" # Runtime check
# Fix common issues
sudo modprobe nvidia # Load module manually
sudo systemctl restart display-manager # Reset graphics after driver change
sudo dkms autoinstall # Rebuild all DKMS modules
# Environment
export CUDA_VISIBLE_DEVICES=0 # Select GPU
export CUDA_LAUNCH_BLOCKING=1 # Debug CUDA errors (slower)
Tested on Ubuntu 24.04 LTS, Arch Linux (Feb 2026), with NVIDIA RTX 3060/4070/4090 and CUDA 12.4