Nvidia graphic card freezes laptop at waking up from sleep state (AMD intergrated/Nvidia discrete)

Hi!

Since I don’t know which kernel but both LTS and Linux kernel from a fresh install of Endeavour OS causing freezing issues with the Journal screaming.

I often get either a blackscreen or a frozen login screen. No mouse movement works, not keyboard responses either. Screen is totally frozen. Only way it to force shutdown with the power button.
Sadly it’s random, I can’t make this consistent.

The laptop is in hybrid mode with the AMD graphic card mostly in use except for gaming.

maj 11 18:39:31 swebow-xmgcoreczne21 DiscordCanary[1242]: 2023-05-11T16:39:31.525Z [Modules] No module updates available.
maj 11 18:39:43 swebow-xmgcoreczne21 kscreenlocker_greet[35371]: Qt: Session management error: networkIdsList argument is NULL
maj 11 18:39:43 swebow-xmgcoreczne21 kscreenlocker_greet[35371]: kf.kirigami: Failed to find a Kirigami platform plugin
maj 11 18:39:43 swebow-xmgcoreczne21 kscreenlocker_greet[35371]: file:///usr/share/plasma/look-and-feel/org.kde.breeze.desktop/contents/components/VirtualKeyboard.qml:8:1: module >
maj 11 18:39:43 swebow-xmgcoreczne21 plasmashell[1070]: qt.qpa.clipboard: QXcbClipboard::setMimeData: Cannot set X11 selection owner
maj 11 18:39:43 swebow-xmgcoreczne21 plasmashell[1070]: qt.qpa.clipboard: QXcbClipboard::setMimeData: Cannot set X11 selection owner
maj 11 18:40:55 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:00 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:05 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:06 swebow-xmgcoreczne21 DiscordCanary[1738]: [0511/184106.906562:ERROR:directory_reader_posix.cc(42)] opendir /home/swebow/.config/discordcanary/Crashpad/attachments/>
maj 11 18:41:10 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:15 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:20 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:25 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:30 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:35 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:40 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:45 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:50 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:41:55 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
maj 11 18:42:00 swebow-xmgcoreczne21 kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:2:0:4040
– Boot aef0e849bcbd440faebcc54016ab28d7 –
maj 11 18:42:20 swebow-xmgcoreczne21 kernel: Linux version 6.3.1-arch2-1 (linux@archlinux) (gcc (GCC) 13.1.1 20230429, GNU ld (GNU Binutils) 2.40.0) #1 SMP PREEMPT_DYNAMIC Wed, 10>
maj 11 18:42:20 swebow-xmgcoreczne21 kernel: Command line: BOOT_IMAGE=/@/boot/vmlinuz-linux root=UUID=e540b74e-b603-42b8-aeb6-8338dc2d5d09 rw rootflags=subvol=@ nowatchdog nvme_lo>
maj 11 18:42:20 swebow-xmgcoreczne21 kernel: x86/fpu: Supporting XSAVE feature 0x001: ‘x87 floating point registers’

Anyone got any ideas? I’ve seen atleast one other person have had similar issue but no solution on Gardas forum. I feel a bit at a loss since I seem not to be able to find anything about this issue. I’ve reinstalled the Nvidia drivers to make sure they’re okey.

EDIT: Took a better journal log before the sleep of the laptop - https://0x0.st/HN89.txt

can you turn oof in UEFI fTPM ?
also add thes parameters on boot kernel
“iommu=pt processor.max_cstate=5 amd_pstate=passive”

and waiting kernel 6.3.2 coming

Sadly I can’t control the fTPM. It’s sadly a very limited BIOS imho. XMG seem not to enable the advanced options as their is a more expansive bios for the TongFang laptops.

Here is what they look like - https://winraid.level1techs.com/t/solved-how-to-unlock-bios-options-of-rebranded-tongfang-chassis-systems/32999/2

Their is “hacks” but not sure if it works on my laptop and I raiher not mess with the BIOS.

My grub now loads like this

GRUB_CMDLINE_LINUX_DEFAULT=‘nowatchdog nvme_load=YES resume=UUID=8058f85a-8f86-445b-8cc8-5e017c2f7147 loglevel=3 iommu=pt processor.max_cstate=5 amd_pstate=passive’

This fixed it for me:

nVidia resume from suspend

sudo nano /etc/modprobe.d/nvidia-power-management.conf

Then add the following line

options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_TemporaryFilePath=/var/tmp

Lenovo Legion 5.

Nice!

I’ll give the previous solution a spin first. If it doesn’t help. I’ll give yours a spin.
I will report back :slight_smile:

Sadly didn’t work.

https://0x0.st/HNMG.txt

Testing @xircon method now. I’m a bit suspect since I don’t have anything in nvidia-power-management.conf so not even sure it’s being used.

I actually meant create it, it didn’t exist for me either.

1 Like

Just have made it, rebooted and now I’m waiting to see how it goes. It’s a lot less errors in Journal at the moment.

No dice. Did you modprobe anything?

https://0x0.st/HNux.txt

firewalld-sysctls.conf nvidia-power-management.conf tuxedo_keyboard.conf
[swebow@swebow-xmgcoreczne21 modprobe.d]$ sudo modprobe nvidia-power-management
modprobe: FATAL: Module nvidia-power-management not found in directory /lib/modules/6.3.1-arch2-1

EDIT: I gonna try with Envycontrol to activate RTD3 and see if it makes any difference.

For what it is worth I update about every 10 days and 6.3.2 kernel and 530 nvidia was having all kinds of issues ACPI errors on my box. I downgraded to 6.3.1 kernel and everything is running fine. Still a couple of ACPI errors but none that were adversely affecting the bluetooth. I haven’t had time to research other than posts I have seen that 530 nvidia and 6.x.x having a bunch of issues. I even tried removing nvidia with nvidia-inst -n and still had issues with 6.3.2 with ACPI.

All I am suggesting is run a downgrade linux linux-headers and see if the problem resolves.

To conclude this story.
Not sure if Kernel 6.3.2 fixed it or if enabling RTD3 fixed it. But now the AMDGPU boots up correctly each time (according to Journal) and Nvidia doesn’t try to take over so it freezes.

http://ix.io/4wMt

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.