Suspend/sleep nvidia-suspend fails on 560.35 drivers

hello i am using endeavouros since last year and have been able to troubleshoot most of the issues by searching the forums or going through arch-wiki, however this issue is persistent and i am unable to solve.

I am using Nvidia 980 Ti GPU and ASUS X99 Delux motherboard
Linux Kernel is 6.6.57 LTS
nvidia-dkms 560.35.3 drivers
grub with dracut

The issue is - whenever i try to suspend the system, the screen goes blank and hdd LED show activity, but the keyboard, mouse does not respond (wake/resume) the system doesn’t sleep at all and I’m able to SSH into it.

journalctl shows the nvidia-suspend fails to execute

-- Boot cc22333996fe476b979073885b644220 --
Oct 20 19:07:53 max-desk systemd[1]: Starting NVIDIA system suspend actions...
Oct 20 19:07:53 max-desk suspend[2841]: nvidia-suspend.service
Oct 20 19:07:53 max-desk logger[2841]: <13>Oct 20 19:07:53 suspend: nvidia-suspend.service
Oct 20 19:07:54 max-desk systemd[1]: nvidia-suspend.service: Main process exited, code=killed, status=9/KILL
Oct 20 19:07:54 max-desk systemd[1]: nvidia-suspend.service: Failed with result 'signal'.
Oct 20 19:07:54 max-desk systemd[1]: Failed to start NVIDIA system suspend actions.
lines 522-572/572 (END)

what i had tried so far

  1. Added kernel command lines by adding drop ins to /etc/modprobe.d/nvidia.conf
options nvidia-drm modeset=1 fbdev=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia NVreg_TemporaryFilePath=/var/tmp
options nvidia NVreg_UsePageAttributeTable=1

  1. Blacklisted the following /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
blacklist nv

  1. added kernel cmdline “mem_sleep_default=deep”

  2. dmesg | grep nvidia gives following result

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux-lts root=UUID=f9643ccf-307a-4f5a-bf21-032adc507330 rw nowatchdog nvme_load=YES resume=UUID=39d362ab-035e-45fa-ba45-23f3992c6b3f nvidia_drm.modeset=1 loglevel=3
[    0.081397] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux-lts root=UUID=f9643ccf-307a-4f5a-bf21-032adc507330 rw nowatchdog nvme_load=YES resume=UUID=39d362ab-035e-45fa-ba45-23f3992c6b3f nvidia_drm.modeset=1 loglevel=3
[    8.014385] nvidia: loading out-of-tree module taints kernel.
[    8.014392] nvidia: module license 'NVIDIA' taints kernel.
[    8.014395] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    8.014396] nvidia: module license taints kernel.
[    8.351587] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[    8.353173] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=io+mem
[    8.499854] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[    8.507710] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  560.35.03  Fri Aug 16 21:21:48 UTC 2024
[    8.613648] nvidia-uvm: Loaded the UVM driver, major device number 235.
[    8.618536] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    9.539921] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
[    9.540005] nvidia 0000:01:00.0: vgaarb: deactivate vga console
[    9.613167] fbcon: nvidia-drmdrmfb (fb0) is primary device
[    9.676217] nvidia 0000:01:00.0: [drm] fb0: nvidia-drmdrmfb frame buffer device
[ 5651.060669] warp-gui[33195]: segfault at 0 ip 0000790fd6610857 sp 00007ffebbb67290 error 4 in libnvidia-glcore.so.560.35.03[790fd6000000+c00000] likely on CPU 8 (core 2, socket 0)
  1. disable systemd freeze user session /etc/systemd/system/systemd-suspend.service.d/disable_freeze_user_session.conf
[Service]
Environment="SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=false"

  1. disable home lock freeze session /etc/systemd/system/systemd-homed.service.d/override.conf
[Service]
Environment="SYSTEMD_HOME_LOCK_FREEZE_SESSION=false"

  1. cat /sys/power/mem_sleep
    s2idle [deep]

still the system fails to suspend

i would like to add that i had tried disabling nvidia-suspend, resume and hibernate services as well but that doesn’t work either

Same story. It works every once in a while.
If firefox and postman are open when hibernating, it will most likely fail.

yes unfortunately, the surprising part is - it used to work before

i now optimize my startup times so that i shutdown and boot everytime

journalctl logs

Dec 07 15:14:31 max-desk systemd-logind[736]: The system will suspend now!
Dec 07 15:14:31 max-desk systemd[1]: Starting NVIDIA system suspend actions...
Dec 07 15:14:31 max-desk suspend[2100]: nvidia-suspend.service
Dec 07 15:14:31 max-desk logger[2100]: <13>Dec  7 15:14:31 suspend: nvidia-suspend.service
Dec 07 15:14:31 max-desk systemd[1]: nvidia-suspend.service: Main process exited, code=killed, status=9/KILL
Dec 07 15:14:31 max-desk systemd[1]: nvidia-suspend.service: Failed with result 'signal'.
Dec 07 15:14:31 max-desk kernel:  uvm_suspend.isra.0+0x94/0x190 [nvidia_uvm 0071ba02daf04645ab5cb7993743a5a3554d12e7]
Dec 07 15:14:31 max-desk kernel:  uvm_suspend_entry+0x7c/0x90 [nvidia_uvm 0071ba02daf04645ab5cb7993743a5a3554d12e7]
Dec 07 15:14:31 max-desk kernel:  nv_uvm_suspend+0x31/0x50 [nvidia 339d56dbf9ec80bf0579abfd9db4206a2ee24310]
Dec 07 15:14:31 max-desk kernel:  nv_procfs_write_suspend+0xef/0x170 [nvidia 339d56dbf9ec80bf0579abfd9db4206a2ee24310]
Dec 07 15:14:31 max-desk systemd[1]: Failed to start NVIDIA system suspend actions.
Dec 07 15:14:31 max-desk systemd-sleep[2135]: in suspend-then-hibernate operations or setups with encrypted home directories.
Dec 07 15:14:31 max-desk systemd-sleep[2135]: Performing sleep operation 'suspend'...
Dec 07 15:14:31 max-desk kernel: PM: suspend entry (s2idle)
Dec 07 15:14:32 max-desk kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Dec 07 15:14:32 max-desk kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
Dec 07 15:14:32 max-desk kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x50 [nvidia] returns -5
Dec 07 15:14:32 max-desk kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
Dec 07 15:14:32 max-desk kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
Dec 07 15:14:32 max-desk kernel: PM: Some devices failed to suspend, or early wake event detected
Dec 07 15:14:32 max-desk kernel: PM: suspend exit
Dec 07 15:16:02 max-desk systemd[1]: systemd-suspend.service: Main process exited, code=exited, status=1/FAILURE
Dec 07 15:17:32 max-desk systemd[1]: systemd-suspend.service: State 'stop-sigterm' timed out. Killing.
Dec 07 15:17:32 max-desk systemd[1]: systemd-suspend.service: Killing process 2192 (nvidia-sleep.sh) with signal SIGKILL.

my /var/tmp is sufficient at 36 GB
i have 24GB ram and nvidia memory is 6GB