Issues with xorg and Nvidia 545

NovaStorm93 · November 15, 2023, 4:35pm

My system has been broken ever since running nvidia-inst trying to fix issues with nvidia 545 and supergfxctl (nvidia dgpu (GTX1660TI Mobile), amd igpu [Radeon vega mobile]). Xorg wont start and throws various errors at me whenever I try to start it. I’ve been checking various threads here and nothing has really helped, only change what xorg errors on.

Xorg for now is erroring with “cannot start framebuffer mode, specify busid”

Any help is appreciated, and lmk what info commands you all need.

inxi -Fxxc0z:
http://ix.io/4LAY

journalctl:
http://ix.io/4LAZ

xorg logs:
http://ix.io/4LB0

The following has not worked:

Reverting to old drivers / kernal
Reinstalling drivers
regenerating the xorg configs
deleting the xorg configs
deleting xf86-video-fbdev and related packages
switching to linux-lts
adding ibt=off to kernal parameters

anon64645096 · November 15, 2023, 5:49pm

You might find useful to try the “nouveau” driver.
https://wiki.archlinux.org/title/Nouveau

Ok, I found my notes from when I had an issue on one of the workstations.
I was able to get rid of nvidia and install “nouveau” with:

$ sudo pacman -Rnsc nvidia-dkms nvidia-utils
$ reboot

After rebooting, nvidia was gone and I’ve been using Nouveau since then with reliability known to Gnu/LInux.

So that could help you get your computer back and access what you need. Or even keep nouveau!!!

Cphusion · November 15, 2023, 6:34pm

It will probably helpful if you shared which Nvidia gpu you are using in your system?

NovaStorm93 · November 15, 2023, 6:45pm

edited.

Nvidia Gpu: GTX 1660TI Mobile

Amd IGPU: Radeon Vega Mobile

NovaStorm93 · November 15, 2023, 7:12pm

switched to noveau. Still no progress, same error as before

(EE) Cannot run in framebuffer mode. Please specify busIDs

what could have nvidia-inst have done to fuck up this bad.

anon64645096 · November 15, 2023, 7:20pm

Oh…
Is there something that could give you hints in there:
$ journalctl -g error -b 0

Of course if you can go to a console … : ctrl alt f3

P.S. the more knowledgeable power users could ask you more info for in-dept search and help.
here’s the link:
https://discovery.endeavouros.com/forum-log-tool-options/how-to-include-systemlogs-in-your-post/2021/03/

Collected info from OP logs:

Graphics:
Device-1: NVIDIA TU116M [GeForce GTX 1660 Ti Mobile] vendor: ASUSTeK driver: nouveau v: kernel
arch: Turing pcie: speed: 8 GT/s lanes: 8 ports: active: none empty: HDMI-A-1 bus-ID: 01:00.0
chip-ID: 10de:2191 temp: 31.0 C
Device-2: AMD Picasso/Raven 2 [Radeon Vega Series / Radeon Mobile Series] vendor: ASUSTeK
driver: amdgpu v: kernel arch: GCN-5 pcie: speed: 8 GT/s lanes: 16 ports: active: eDP-1
empty: none bus-ID: 05:00.0 chip-ID: 1002:15d8 temp: 46.0 C

NovaStorm93 · November 15, 2023, 7:48pm

http://ix.io/4LB7

Dont know if any of those are related but nothing I could see directly.

Rest of the logs are edited into the OP.

anon64645096 · November 15, 2023, 8:14pm

Information from forum search; possibly needed for troubleshooting:
https://wiki.archlinux.org/title/Hybrid_graphics#Fully_power_down_discrete_GPU

I’m continuing reading the logs and searching. (These errors that follow might not be related … I just don’t have the knowledge for that.)

Nov 15 14:29:17 NovaStorm93Laptop kernel: ACPI FADT declares the system doesn’t support PCIe ASPM, so disable it

Nov 15 14:29:17 NovaStorm93Laptop kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230628/psobject-220)

Nov 15 14:29:17 NovaStorm93Laptop kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.GPP0.SWUS.SWDS], AE_NOT_FOUND (20230628/dswload2-162)

Nov 15 14:29:17 NovaStorm93Laptop kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230628/psobject-220)

Nov 15 14:29:17 NovaStorm93Laptop kernel: tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xbd6a5000-0xbd6a5fff flags 0x200] vs bd6a5000 4000
Nov 15 14:29:17 NovaStorm93Laptop kernel: tpm_crb MSFT0101:00: [Firmware Bug]: ACPI region does not cover the entire command/response buffer. [mem 0xbd6a9000-0xbd6a9fff flags 0x200] vs bd6a9000 4000
Nov 15 14:29:17 NovaStorm93Laptop kernel: tpm_crb MSFT0101:00: Disabling hwrng

Display: server: X.org v: 1.21.1.9 with: Xwayland v: 23.2.2 driver: X: loaded: modesetting
alternate: nvidia gpu: amdgpu tty: 240x67

Local Storage: total: 1.14 TiB used: 184.64 GiB (15.8%)
ID-1: /dev/nvme0n1 vendor: Western Digital model: PC SN520 SDAPNUW-256G-1002 size: 238.47 GiB
speed: 15.8 Gb/s lanes: 2 serial: temp: 38.9 C
ID-2: /dev/sda vendor: Toshiba model: MQ04ABF100 size: 931.51 GiB speed: 6.0 Gb/s
serial:
Partition:
ID-1: / size: 233.38 GiB used: 184.34 GiB (79.0%) fs: ext4 dev: /dev/nvme0n1p2
ID-2: /boot/efi size: 299.4 MiB used: 299.4 MiB (100.0%) fs: vfat dev: /dev/nvme0n1p1
Swap:
Alert: No swap data was found.

Svartis · November 15, 2023, 11:35pm

With 545 they changed the kernel module settings. Perhaps this has something to do with it.

https://wiki.archlinux.org/title/NVIDIA#DRM_kernel_mode_setting

To enable DRM (Direct Rendering Manager) kernel mode setting, set modeset=1 and fbdev=1 kernel module parameters for the nvidia_drm module. The latter is required to tell the nvidia driver to provide its own framebuffer device instead of relying on efifb or vesafb , which don’t work under simpledrm . For nvidia driver version < 545, the nvidia_drm.modeset=1 option must be set through kernel parameters, in order to disable simpledrm [1] (for more information, refer to FS#73720).

No success guarantee:


sudo pacman -S nvidia-dkms nvidia-utils lib32-nvidia-utils nvidia-settings

sudo touch /etc/dracut.conf.d/nvidia.conf
sudo nano /etc/dracut.conf.d/nvidia.conf

Insert the following:

force_drivers+=" nvidia nvidia_modeset nvidia_uvm nvidia_drm "

When using default systemd-boot and dracut:

sudo reinstall-kernels

When using Grub and dracut:

sudo dracut-rebuild

sudo touch /etc/modprobe.d/nvidia.conf
sudo nano /etc/modprobe.d/nvidia.conf

Insert the following:

options nvidia_drm modeset=1 fbdev=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1

Remove previously used kernel parameters, if present. Consult Arch wiki for your bootloader:(https://wiki.archlinux.org/title/Kernel_parameters):

nvidia_drm.modeset=1

joekamprad · November 16, 2023, 12:37am

nvidia-inst basically does not much aside from setting drm modesetting in addition to install the needed packages.

And it has a proper reset option nvidia-inst -n what reverts to use nouveau drivers (open source) and removes the nvidia packages and the bootparameter.

But it could be the changes mentioned on archwiki are helpful in case.

I will check about this…

NovaStorm93 · November 16, 2023, 1:15am

my endeavorOS install is kinda old and is still using mkinitcpio + grub.

edit: disregard I installed dracut.

NovaStorm93 · November 16, 2023, 1:28am

this didnt solve it unfortunately. same framebuffer error.

Svartis · November 16, 2023, 1:56am

I‘m sorry to hear this.
Is there any option in your bios settings to turn off the nvidia gpu? Have you tried to boot with the amd gpu as only option? To make sure that the nvidia gpu really is the culprit ?

NovaStorm93 · November 16, 2023, 2:18am

supergfxctl was normally how i disabled the GPU and installing it does disable it, but it also prevents the GPU from loading at all and it (along with all other gpu switchers) have some very strange behavior probably related to the root issue, where it wont switch modes and the daemon hangs while trying to switch.

With the nvidia gpu disabled xorg throws a different error instead:
(EE) no screens found(EE)

and still refuses to launch

NovaStorm93 · November 16, 2023, 2:24am

every time I see the terminal hang at
[ OK ] Started Simple Desktop Display Manager.

i get one step closer to moving to pop!_os and having system76 deal with my problems

Svartis · November 16, 2023, 9:54am

Can you post the result of:

ls -a /etc/X11/xorg.conf.d/

and

ls -a /etc/X11/

and read this, perhaps this is a useful hint:

https://wiki.archlinux.org/title/NVIDIA/Troubleshooting#X_fails_with_"no_screens_found"_when_using_Multiple_GPUs

NovaStorm93 · November 16, 2023, 2:42pm

ls -a /etc/X11/xorg.conf.d/

00-keybord.conf

ls -a /etc/X11/

tigervnc
xinit
xorg.conf.backup
xorg.conf.d
xorg.conf.nvidia-xconfig-original

Log of xorg after specifying integrated graphics:
http://ix.io/4LEq

throws a new error:
“parse_vt_settings: Cannot open /dev/tty0 (Permission denied)”

still not running

NovaStorm93 · November 16, 2023, 4:00pm

holy god this made no sense

I resorted to completely wiping all xorg config files and installing supergfxctl again. for some reason this fixes it.

I am never touching nvidia drivers again

NovaStorm93 · November 16, 2023, 5:14pm

the xorg config is very unstable, might use this as an opportunity to move files off and do a reinstall of eos.

Svartis · November 16, 2023, 5:27pm

In many wikis there is a notice, that it is better to have no config files for xorg with the nvidia drivers. That would have been my next answer (because of this I asked for the output of “ls -a …”), to rename/move every config file in those directories. But I was busy at my job, that’s why I couldn’t answer til now

But I’m glad to hear it’s working now.