Struggling getting a NVIDIA GPU Working in an External Enclosure

nomb85 · September 15, 2022, 1:59am

Hello everyone.

I have only been on EndeavourOS for about a week. I started on Arch many years ago and then switched to Manjaro for a long time. I recently bought a 3090 and threw it into a Razer Core X Chroma I bought after modding it to handle the card. No matter what I did, I could not get the eGPU working on Manjaro.

I recently installed EndeavourOS on my step-dad’s laptop and figured I would install it on mine in place of Manjaro to figure out all the things he needed to do to get it running smoothly (we have the same laptop) and decided to stay with it for mine as well because, it just feels better. Not really sure how to say that.

I have followed the guides I could find, I did also use the nvidia-inst tool. I have made progress but I am struggling to get it working. I know I’m not using a normal setup but I was hoping someone here might have some experience that could get me over the finish line.

I first ran this:

nvidia-inst --series 515 -t

Which then had me run this:

sudo pacman -Syuq --noconfirm --noprogressbar --needed nvidia-dkms nvidia-utils nvidia-settings nvidia-hook && sudo nvidia-installer-kernel-para && sudo grub-mkconfig -o /boot/grub/grub.cfg

Afterwards I ran the first command except not testing:

nvidia-inst --series 515

And later I thought I would need prime and did:

nvidia-inst --series 515 --prime

I did a few things for this system such as adding ibt=off to the kernel params and rebooted.

The device is detected:

$ inxi -G
Graphics:
  Device-1: Intel Alder Lake-P Integrated Graphics driver: i915 v: kernel
  Device-2: NVIDIA GA102 [GeForce RTX 3090] driver: nvidia v: 515.65.01
  Device-3: Realtek Laptop Camera type: USB driver: uvcvideo
  Display: x11 server: X.Org v: 21.1.4 with: Xwayland v: 22.1.3 driver: X:
    loaded: modesetting unloaded: nvidia gpu: i915 resolution:
    1: 1920x1080~144Hz 2: 2256x1504~60Hz
  OpenGL: renderer: Mesa Intel Graphics (ADL GT2) v: 4.6 Mesa 22.1.7

But no matter what I do I cannot get anything to work on it.

I can not getting anything out on the monitor and even running something like nvidia-smi doesn’t see the card.

I did try a few different xorg conf files. First I tried the one from running nvidia-inst --conf which didn’t work. Then I also tried the 80-igpu-primary-egpu-offload.conf which didn’t get me anywhere either.

Running nvidia-smi just said no devices found.

I should say this is my first attempt at a NVIDIA GPU with Linux. At this point I’m really not sure what to do to troubleshoot or make progress. Any help would be greatly appreciated. Thanks everyone!

ricklinux · September 15, 2022, 2:04am

Have you read through the Arch wiki related to eGPU?

https://wiki.archlinux.org/title/External_GPU#Installation

nomb85 · September 15, 2022, 2:12am

Yes that is where I got the xorg.conf file from that I referenced in my post.
Something isn’t right though because my xrandr doesn’t show two providers.
Just the one internal one.

ricklinux · September 15, 2022, 2:42am

There are two different xorg files. Which one are you using?

nomb85 · September 15, 2022, 2:48am

Well I tried two different xorg files but not at the same time.

I currently only have one /etc/X11/xorg.conf.d/80-igpu-primary-egpu-offload.conf:

$ cat /etc/X11/xorg.conf.d/80-igpu-primary-egpu-offload.conf 
Section "Device"
    Identifier "Device0"
    Driver     "modesetting"
EndSection

Section "Device"
    Identifier "Device1"
    Driver     "nvidia"
    BusID      "PCI:127:00:0"                # Edit according to lspci, translate from hex to decimal.
    Option     "AllowExternalGpus" "True"    # Required for proprietary NVIDIA driver.
EndSection

I did also try just something like this:

Section "Device"
    Identifier "Device0"
    Driver     "nvidia"
    BusID      "PCI:127:00:0"                # Edit according to lspci, translate from hex to decimal.
    Option     "AllowExternalGpus" "True"    # Required for proprietary NVIDIA driver.
EndSection

Without the top section but then the screen was just black.

ricklinux · September 15, 2022, 12:57pm

Where did you get this from the BusID from?

BusID      "PCI:127:00:0"

https://download.nvidia.com/XFree86/Linux-x86_64/396.51/README/egpu.html

nomb85 · September 15, 2022, 4:32pm

I got it from the lspci output.

$ lspci | grep NVIDIA
7f:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
7f:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)

And then converting from hex to decimal 7f=127, 0=0.

ricklinux · September 15, 2022, 5:23pm

Did you read this? Maybe helps?

https://us.download.nvidia.com/XFree86/Linux-x86_64/465.27/README/faq.html#busid

nomb85 · September 15, 2022, 6:50pm

I hadn’t read that as I followed the Arch guide but it aligns with what it says there which is also what I did so looks like what I have is correct.

nomb85 · September 15, 2022, 7:06pm

I think that the weirdness is coming from the fact that some commands show the GPU and others do not see it. And running inxi -G says the driver is unloaded but I have no idea why.

$ inxi -G
Graphics:
  Device-1: Intel Alder Lake-P Integrated Graphics driver: i915 v: kernel
  Device-2: NVIDIA GA102 [GeForce RTX 3090] driver: nvidia v: 515.65.01
  Device-3: Realtek Laptop Camera type: USB driver: uvcvideo
  Display: x11 server: X.Org v: 21.1.4 with: Xwayland v: 22.1.3 driver: X:
    loaded: modesetting unloaded: nvidia gpu: i915 resolution:
    1: 1920x1080~144Hz 2: 2256x1504~60Hz
  OpenGL: renderer: Mesa Intel Graphics (ADL GT2) v: 4.6 Mesa 22.1.7

$ lspci
7f:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
7f:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)

$ nvidia-smi
No devices were found

ricklinux · September 15, 2022, 10:00pm

It shows the driver is installed and loaded it just running on the Nvidia driver. It’s running on the Intel driver. You have to make it switch somehow. It’s supposed to when plugged in. I’m not what port you are plugging ii into.

nomb85 · September 16, 2022, 12:09am

I think it’s a bit more than that I’m just not sure what. I also haven’t used Wayland for a long time at this point and I’m really rusty for X.

I thought the above shows that modesetting is loaded but unloaded: nvidia. Which I’m assuming to mean the X server hasn’t loaded the nvidia driver.

However to play devils advocate against myself lsmod shows this:

$ lsmod | grep nvidia
nvidia_drm             73728  0
nvidia_modeset       1429504  1 nvidia_drm
nvidia_uvm           2740224  0
nvidia              45383680  8 nvidia_uvm,nvidia_modeset

So I really have no idea.

What I do know though, is that even without having the nvidia GPU configured for any sort of display, I am supposed to be able to use the NVIDIA utils such as nvidia-smi and that doesn’t see a GPU at all.

ricklinux · September 16, 2022, 2:01am

It’s rendering on the Intel graphics. It’s not using Nvidia what so ever. But the Nvidia drivers are installed.

 OpenGL: renderer: Mesa Intel Graphics (ADL GT2) v: 4.6 Mesa 22.1.7

Nvidia 
~~~
Device-2: NVIDIA GA102 [GeForce RTX 3090] driver: nvidia v: 515.65.01
~~~

~~~
$ nvidia-smi
No devices were found
~~~

When it is running on Nvidia it would be rendering using the Nvidia graphics.

ricklinux · September 16, 2022, 3:04am

How is your eGPU connected?

https://wiki.archlinux.org/title/External_GPU#Xorg

I also said before you may need a way of switching. Using optimus-manager for instance.

Tip: If using optimus-manager on a laptop, you can render on eGPU by adding the BusId of the eGPU in the appropriate file for your mode and graphics card in /etc/optimus-manager/xorg/.