Issues with xorg and Nvidia 545

My system has been broken ever since running nvidia-inst trying to fix issues with nvidia 545 and supergfxctl (nvidia dgpu (GTX1660TI Mobile), amd igpu [Radeon vega mobile]). Xorg wont start and throws various errors at me whenever I try to start it. I’ve been checking various threads here and nothing has really helped, only change what xorg errors on.

Xorg for now is erroring with “cannot start framebuffer mode, specify busid”

Any help is appreciated, and lmk what info commands you all need.

inxi -Fxxc0z:
http://ix.io/4LAY

journalctl:
http://ix.io/4LAZ

xorg logs:
http://ix.io/4LB0

The following has not worked:

  1. Reverting to old drivers / kernal
  2. Reinstalling drivers
  3. regenerating the xorg configs
  4. deleting the xorg configs
  5. deleting xf86-video-fbdev and related packages
  6. switching to linux-lts
  7. adding ibt=off to kernal parameters

You might find useful to try the “nouveau” driver.
https://wiki.archlinux.org/title/Nouveau

Ok, I found my notes from when I had an issue on one of the workstations.
I was able to get rid of nvidia and install “nouveau” with:

$ sudo pacman -Rnsc nvidia-dkms nvidia-utils
$ reboot

After rebooting, nvidia was gone and I’ve been using Nouveau since then with reliability known to Gnu/LInux.

So that could help you get your computer back and access what you need. Or even keep nouveau!!! :wink:

It will probably helpful if you shared which Nvidia gpu you are using in your system?

edited.

Nvidia Gpu: GTX 1660TI Mobile

Amd IGPU: Radeon Vega Mobile

switched to noveau. Still no progress, same error as before

(EE) Cannot run in framebuffer mode. Please specify busIDs

what could have nvidia-inst have done to fuck up this bad.

Oh… :frowning:
Is there something that could give you hints in there:
$ journalctl -g error -b 0

Of course if you can go to a console … : ctrl alt f3

P.S. the more knowledgeable power users could ask you more info for in-dept search and help.
here’s the link:
https://discovery.endeavouros.com/forum-log-tool-options/how-to-include-systemlogs-in-your-post/2021/03/

Collected info from OP logs:

Graphics:
Device-1: NVIDIA TU116M [GeForce GTX 1660 Ti Mobile] vendor: ASUSTeK driver: nouveau v: kernel
arch: Turing pcie: speed: 8 GT/s lanes: 8 ports: active: none empty: HDMI-A-1 bus-ID: 01:00.0
chip-ID: 10de:2191 temp: 31.0 C
Device-2: AMD Picasso/Raven 2 [Radeon Vega Series / Radeon Mobile Series] vendor: ASUSTeK
driver: amdgpu v: kernel arch: GCN-5 pcie: speed: 8 GT/s lanes: 16 ports: active: eDP-1
empty: none bus-ID: 05:00.0 chip-ID: 1002:15d8 temp: 46.0 C

http://ix.io/4LB7

Dont know if any of those are related but nothing I could see directly.

Rest of the logs are edited into the OP.

Information from forum search; possibly needed for troubleshooting:
https://wiki.archlinux.org/title/Hybrid_graphics#Fully_power_down_discrete_GPU

I’m continuing reading the logs and searching. (These errors that follow might not be related … I just don’t have the knowledge for that.)

Nov 15 14:29:17 NovaStorm93Laptop kernel: ACPI FADT declares the system doesn’t support PCIe ASPM, so disable it

Nov 15 14:29:17 NovaStorm93Laptop kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230628/psobject-220)

Nov 15 14:29:17 NovaStorm93Laptop kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.GPP0.SWUS.SWDS], AE_NOT_FOUND (20230628/dswload2-162)

Nov 15 14:29:17 NovaStorm93Laptop kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230628/psobject-220)

Nov 15 14:29:17 NovaStorm93Laptop kernel: tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xbd6a5000-0xbd6a5fff flags 0x200] vs bd6a5000 4000
Nov 15 14:29:17 NovaStorm93Laptop kernel: tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xbd6a9000-0xbd6a9fff flags 0x200] vs bd6a9000 4000
Nov 15 14:29:17 NovaStorm93Laptop kernel: tpm_crb MSFT0101:00: Disabling hwrng


Display: server: X.org v: 1.21.1.9 with: Xwayland v: 23.2.2 driver: X: loaded: modesetting
alternate: nvidia gpu: amdgpu tty: 240x67


Local Storage: total: 1.14 TiB used: 184.64 GiB (15.8%)
ID-1: /dev/nvme0n1 vendor: Western Digital model: PC SN520 SDAPNUW-256G-1002 size: 238.47 GiB
speed: 15.8 Gb/s lanes: 2 serial: temp: 38.9 C
ID-2: /dev/sda vendor: Toshiba model: MQ04ABF100 size: 931.51 GiB speed: 6.0 Gb/s
serial:
Partition:
ID-1: / size: 233.38 GiB used: 184.34 GiB (79.0%) fs: ext4 dev: /dev/nvme0n1p2
ID-2: /boot/efi size: 299.4 MiB used: 299.4 MiB (100.0%) fs: vfat dev: /dev/nvme0n1p1
Swap:
Alert: No swap data was found.

With 545 they changed the kernel module settings. Perhaps this has something to do with it.

https://wiki.archlinux.org/title/NVIDIA#DRM_kernel_mode_setting

To enable DRM (Direct Rendering Manager) kernel mode setting, set modeset=1 and fbdev=1 kernel module parameters for the nvidia_drm module. The latter is required to tell the nvidia driver to provide its own framebuffer device instead of relying on efifb or vesafb , which don’t work under simpledrm . For nvidia driver version < 545, the nvidia_drm.modeset=1 option must be set through kernel parameters, in order to disable simpledrm [1] (for more information, refer to FS#73720).

No success guarantee:


sudo pacman -S nvidia-dkms nvidia-utils lib32-nvidia-utils nvidia-settings

sudo touch /etc/dracut.conf.d/nvidia.conf
sudo nano /etc/dracut.conf.d/nvidia.conf

Insert the following:

force_drivers+=" nvidia nvidia_modeset nvidia_uvm nvidia_drm "

When using default systemd-boot and dracut:

sudo reinstall-kernels

When using Grub and dracut:

sudo dracut-rebuild

sudo touch /etc/modprobe.d/nvidia.conf
sudo nano /etc/modprobe.d/nvidia.conf

Insert the following:

options nvidia_drm modeset=1 fbdev=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1

Remove previously used kernel parameters, if present. Consult Arch wiki for your bootloader:(https://wiki.archlinux.org/title/Kernel_parameters):

nvidia_drm.modeset=1  

nvidia-inst basically does not much aside from setting drm modesetting in addition to install the needed packages.

And it has a proper reset option nvidia-inst -n what reverts to use nouveau drivers (open source) and removes the nvidia packages and the bootparameter.

But it could be the changes mentioned on archwiki are helpful in case.

I will check about this…

my endeavorOS install is kinda old and is still using mkinitcpio + grub.

edit: disregard I installed dracut.

this didnt solve it unfortunately. same framebuffer error.

I‘m sorry to hear this.
Is there any option in your bios settings to turn off the nvidia gpu? Have you tried to boot with the amd gpu as only option? To make sure that the nvidia gpu really is the culprit ?

supergfxctl was normally how i disabled the GPU and installing it does disable it, but it also prevents the GPU from loading at all and it (along with all other gpu switchers) have some very strange behavior probably related to the root issue, where it wont switch modes and the daemon hangs while trying to switch.

With the nvidia gpu disabled xorg throws a different error instead:
(EE) no screens found(EE)

and still refuses to launch

every time I see the terminal hang at
[ OK ] Started Simple Desktop Display Manager.

i get one step closer to moving to pop!_os and having system76 deal with my problems

Can you post the result of:

ls -a /etc/X11/xorg.conf.d/

and

ls -a /etc/X11/

and read this, perhaps this is a useful hint:

https://wiki.archlinux.org/title/NVIDIA/Troubleshooting#X_fails_with_"no_screens_found"_when_using_Multiple_GPUs

ls -a /etc/X11/xorg.conf.d/

00-keybord.conf

ls -a /etc/X11/

tigervnc
xinit
xorg.conf.backup
xorg.conf.d
xorg.conf.nvidia-xconfig-original

Log of xorg after specifying integrated graphics:
http://ix.io/4LEq

throws a new error:
“parse_vt_settings: Cannot open /dev/tty0 (Permission denied)”

still not running

holy god this made no sense

I resorted to completely wiping all xorg config files and installing supergfxctl again. for some reason this fixes it.

I am never touching nvidia drivers again

2 Likes

the xorg config is very unstable, might use this as an opportunity to move files off and do a reinstall of eos.

In many wikis there is a notice, that it is better to have no config files for xorg with the nvidia drivers. That would have been my next answer (because of this I asked for the output of “ls -a …”), to rename/move every config file in those directories. But I was busy at my job, that’s why I couldn’t answer til now :slight_smile:

But I’m glad to hear it’s working now.