I’m facing this issue that the nvidia-container-toolkit on AUR is not running on my GPU. In addition, for some reason nvidia-docker on AUR is deprecated and refers to nvidia-container-toolkit. Has anyone succeeded with this?
steps I have followed:
yay -S nvidia-container-toolkit
systemctl restart docker
docker run --gpus all nvidia/cuda:11.3.0-runtime-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.3.0-runtime-ubuntu20.04' locally
11.3.0-runtime-ubuntu20.04: Pulling from nvidia/cuda
35807b77a593: Pull complete
49cb88ffff67: Pull complete
d9b6efb328ff: Pull complete
42e75dc40145: Pull complete
643865375e49: Pull complete
f5fa57203796: Pull complete
ccb371070e40: Pull complete
Digest: sha256:b7206523a857c8ed93940fe1d1cb5d8a4cc1da59bdbd019954624dfe00ea3958
Status: Downloaded newer image for nvidia/cuda:11.3.0-runtime-ubuntu20.04
Failed to initialize NVML: Unknown Error
info
(base) [tigran@asus ~]$ hostnamectl
Static hostname: asus
Icon name: computer-laptop
Chassis: laptop
Machine ID: 605b483f9d2c4ea49675eeb514c570b6
Boot ID: 9f13d04fc3ab4fcebbf4a99a1080a3be
Operating System: EndeavourOS
Kernel: Linux 5.15.2-arch1-1
Architecture: x86-64
Hardware Vendor: ASUSTeK COMPUTER INC.
Hardware Model: ROG Zephyrus G14 GA401IV_GA401IV
OS: EndeavourOS rolling rolling
Kernel: x86_64 Linux 5.15.2-arch1-1
Uptime: 12m
Packages: 1089
Shell: bash 5.1.8
Resolution: 2560x1080
DE: KDE 5.88.0 / Plasma 5.23.3
WM: KWin
GTK Theme: Breeze [GTK2/3]
Icon Theme: breeze-dark
Disk: 237G / 946G (27%)
CPU: AMD Ryzen 9 4900HS with Radeon Graphics @ 16x 3GHz
GPU: NVIDIA GeForce RTX 2060 with Max-Q Design
RAM: 2954MiB / 15478MiB
Following the recommendation of adding systemd.unified_cgroup_hierarchy=false param in /proc/cmdline solved the issue. Now when I run run --gpus all nvidia/cuda:11.3.0-runtime-ubuntu20.04 nvidia-smi I see the driver and cuda version inside the container.
Now the issue is that I am not able to give some nviida-docker specific flags to the container. To reproduce:
PS: The os boot issue is not relevant anymore as I have ended up reinstalling EOS with the non-free nvidia drivers and it worked out for me. Both the local and docker nvidia drivers give the expected results when I run nvidia-smi now.
There’s a newline here. Remove it, or escape it with a \ .
Also, as this isn’t really an EnOS issue, you might want to so some reading around how to use a Linux system, or, if this is for something like a research project, ask people more locally whether there is any training available.