How to run GPU accelerated docker container on Arch?

I’m facing this issue that the nvidia-container-toolkit on AUR is not running on my GPU. In addition, for some reason nvidia-docker on AUR is deprecated and refers to nvidia-container-toolkit. Has anyone succeeded with this?

steps I have followed:

yay -S nvidia-container-toolkit
systemctl restart docker
docker run --gpus all nvidia/cuda:11.3.0-runtime-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.3.0-runtime-ubuntu20.04' locally
11.3.0-runtime-ubuntu20.04: Pulling from nvidia/cuda
35807b77a593: Pull complete 
49cb88ffff67: Pull complete 
d9b6efb328ff: Pull complete 
42e75dc40145: Pull complete 
643865375e49: Pull complete 
f5fa57203796: Pull complete 
ccb371070e40: Pull complete 
Digest: sha256:b7206523a857c8ed93940fe1d1cb5d8a4cc1da59bdbd019954624dfe00ea3958
Status: Downloaded newer image for nvidia/cuda:11.3.0-runtime-ubuntu20.04
Failed to initialize NVML: Unknown Error

info

(base) [tigran@asus ~]$ hostnamectl
 Static hostname: asus
       Icon name: computer-laptop
         Chassis: laptop
      Machine ID: 605b483f9d2c4ea49675eeb514c570b6
         Boot ID: 9f13d04fc3ab4fcebbf4a99a1080a3be
Operating System: EndeavourOS                     
          Kernel: Linux 5.15.2-arch1-1
    Architecture: x86-64
 Hardware Vendor: ASUSTeK COMPUTER INC.
  Hardware Model: ROG Zephyrus G14 GA401IV_GA401IV

OS: EndeavourOS rolling rolling
Kernel: x86_64 Linux 5.15.2-arch1-1
Uptime: 12m
Packages: 1089
Shell: bash 5.1.8
Resolution: 2560x1080
DE: KDE 5.88.0 / Plasma 5.23.3
WM: KWin
GTK Theme: Breeze [GTK2/3]
Icon Theme: breeze-dark
Disk: 237G / 946G (27%)
CPU: AMD Ryzen 9 4900HS with Radeon Graphics @ 16x 3GHz
GPU: NVIDIA GeForce RTX 2060 with Max-Q Design
RAM: 2954MiB / 15478MiB

@dalto @jonathon ? :upside_down_face:

Have you read the comments on the package page? Which drivers do you have installed? What does nvidia-smi show?

Did you read the post-install message that popped up after you installed the package?

I note you recently broke your drivers and did “something” earlier on, Something prevents my os to boot (arbitrarily) - #66 by tmargary

1 Like

Following the recommendation of adding systemd.unified_cgroup_hierarchy=false param in /proc/cmdline solved the issue. Now when I run run --gpus all nvidia/cuda:11.3.0-runtime-ubuntu20.04 nvidia-smi I see the driver and cuda version inside the container.

Now the issue is that I am not able to give some nviida-docker specific flags to the container. To reproduce:

# 1. download: https://developer.nvidia.com/modulus-examples-v2106
# 2. download: https://developer.nvidia.com/modulus-container-v2106
# 3. docker load -i modulus_image_v21.06.tar.gz
# 4. tar -xvzf ./Modulus_examples.tar.gz
# 5. docker run --shm-size==1g --ulimit memlock==-1 --ulimit stack==67108864
#    --runtime nvidia -v ${PWD}/examples:/examples -it modulus:21.06 bash

PS: The os boot issue is not relevant anymore as I have ended up reinstalling EOS with the non-free nvidia drivers and it worked out for me. Both the local and docker nvidia drivers give the expected results when I run nvidia-smi now.

There’s a newline here. Remove it, or escape it with a \ .

Also, as this isn’t really an EnOS issue, you might want to so some reading around how to use a Linux system, or, if this is for something like a research project, ask people more locally whether there is any training available.

Agree with you. Just wanted to give the current status. Thank you.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.