Laptop freezes with blinking capslock LED (2)

This is the second time I am creating a thread about this issue. The previous thread can be found here. In summary, my laptop freezes from time to time. I have been made aware that Nvidia graphics driver could be the issue, and the solution according to this thread is go switch to the LTS kernel and install the 535 Nvidia drivers. I have done all that. Today, I was updating my laptop when I noticed that the capslock LED was flashing. I then realized that the laptop has frozen. Luckily, I had made some backup and managed to restore the issue. I have become fedup with this issue. Sometimes my laptop (Thinkpad P52) doesn’t wake from sleep, and I have to do manual restart, losing some data in the process. The part I don’t understand is that the freezing leaves no trace in journalctl. Surely, there must be a way to know what exactly is causing the problem? Are there solutions to these issues (prevent laptop from freezing, and having a way to diagnose the reason for the freezing)?

Maybe you try set up the services for powermanagement.

https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/powermanagement.html

Edit: I haven’t had this issue on my desktop with nvidia using the current 555.xx drivers

Edit2: This may only help with sleep and or suspend, hibernation and resume. On my desktop I don’t have any of these sevices enabled but i don’t use hibernation or suspend or sleep.

Thanks for your reply. Just to reiterate. I face two types of freezes:

  1. The first one is wake from sleep. This usually happens when I try to wake up my laptop from sleep with only the laptop’s display after having set it to sleep when there were two monitors connected. External monitors are wired directly to the Nvidia GPU. This is why I believe the issue might be coming from the Nvidia drivers. However, it doesn’t happen always, making it hard to reproduce it.

  2. The second issue I have is that the laptop sometimes freezes when I am doing updates, specifically when it is running dracut commands. Again, this doesn’t happen always, making it hard to reproduce. I had done a 6h memtest and got no error.

I wish i could be more helpful. I have read that some users report the P52 does having some freezing issues but on Windows? You can find it on the Lenovo site.

Here’s what happens when the laptop freezes when it wakes from sleep. During this time, pressing the capslock toggles the LED, which tells me the laptop is somewhat still responsive. But when I try to access the terminal (Ctrl + Alt+ F1-4), I can no longer toggle the LED on the capslock. Could be this a power management issue on Nvidia?

My second issue is that all these issues are not recorded in the journalctl. journalctl records everything just before the laptop freezes. Is there something else to check and understand why the laptop is freezing?

can you try install normal kernel and use nvidia-open-dkms and observe what will happen?

also have you followed all steps from https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks#Preserve_video_memory_after_suspend

and https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks#Driver_persistence

@ricklinux, I was looking at the power management method you suggested, but it seems that that is the default behavior on my system. That is, nvidia-suspend.service and nvidia-resume.service are already enabled.

I tried to manually turn on the nvidia-suspend.service as you can see in the log below. The laptop froze exactly the same way as it sometimes does when it is resuming from sleep. I don’t know what to make from this information, but at least it is confirms that the Nvidia drivers might be causing the problem.

Aug 14 16:32:57 P50 sudo[8732]: pam_systemd_home(sudo:auth): New sd-bus connection (system-bus-pam-systemd-home-8732) opened.
Aug 14 16:33:01 P50 sudo[8732]:  medwatt : TTY=pts/2 ; PWD=/home/medwatt ; USER=root ; COMMAND=/usr/bin/systemctl enable --now nvidia-resume.service
Aug 14 16:33:01 P50 sudo[8732]: pam_unix(sudo:session): session opened for user root(uid=0) by medwatt(uid=1000)
Aug 14 16:33:01 P50 systemd[1]: Reload requested from client PID 8741 ('systemctl') (unit user@1000.service)...
Aug 14 16:33:01 P50 systemd[1]: Reloading...
Aug 14 16:33:01 P50 systemd[1]: Reloading finished in 229 ms.
Aug 14 16:33:01 P50 systemd[1]: Starting Discard unused blocks on filesystems from /etc/fstab...
Aug 14 16:33:01 P50 systemd[1]: Starting NVIDIA system resume actions...
Aug 14 16:33:01 P50 suspend[8797]: nvidia-resume.service
Aug 14 16:33:01 P50 systemd[1]: Starting Daily man-db regeneration...
Aug 14 16:33:01 P50 logger[8797]: <13>Aug 14 16:33:01 suspend: nvidia-resume.service
Aug 14 16:33:01 P50 systemd[1]: nvidia-resume.service: Deactivated successfully.
Aug 14 16:33:01 P50 systemd[1]: Finished NVIDIA system resume actions.
Aug 14 16:33:01 P50 sudo[8732]: pam_unix(sudo:session): session closed for user root
Aug 14 16:33:08 P50 sudo[8843]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-8843) opened.
Aug 14 16:33:08 P50 sudo[8843]:  medwatt : TTY=pts/2 ; PWD=/home/medwatt ; USER=root ; COMMAND=/usr/bin/systemctl enable --now nvidia-suspend.service
Aug 14 16:33:08 P50 sudo[8843]: pam_unix(sudo:session): session opened for user root(uid=0) by medwatt(uid=1000)
Aug 14 16:33:08 P50 systemd[1]: Reload requested from client PID 8846 ('systemctl') (unit user@1000.service)...
Aug 14 16:33:08 P50 systemd[1]: Reloading...
Aug 14 16:33:08 P50 systemd[1]: Reloading finished in 210 ms.
Aug 14 16:33:08 P50 systemd[1]: Starting NVIDIA system suspend actions...
Aug 14 16:33:08 P50 suspend[8903]: nvidia-suspend.service
Aug 14 16:33:08 P50 logger[8903]: <13>Aug 14 16:33:08 suspend: nvidia-suspend.service
Aug 14 16:33:09 P50 systemd[1]: nvidia-suspend.service: Deactivated successfully.
Aug 14 16:33:09 P50 systemd[1]: Finished NVIDIA system suspend actions.
Aug 14 16:33:09 P50 sudo[8843]: pam_unix(sudo:session): session closed for user root
Aug 14 16:33:18 P50 systemd[1]: man-db.service: Deactivated successfully.
Aug 14 16:33:18 P50 systemd[1]: Finished Daily man-db regeneration.
Aug 14 16:33:18 P50 systemd[1]: man-db.service: Consumed 11.540s CPU time, 197.6M memory peak.
-- Boot b73a97755c6b4a15b0c43a0a1e86c85e --
Aug 14 16:34:43 P50 kernel: microcode: updated early: 0xf4 -> 0xf8, date = 2024-02-01
Aug 14 16:34:43 P50 kernel: Linux version 6.6.45-1-lts (linux-lts@archlinux) (gcc (GCC) 14.2.1 20240805, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Sun, 11 Aug 2024 14:02:12 +0000
Aug 14 16:34:43 P50 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-linux-lts root=UUID=8ca92686-f8d0-47b2-80a1-5d4b5ebf7c2c rw nowatchdog nvme_load=YES loglevel=3
Aug 14 16:34:43 P50 kernel: BIOS-provided physical RAM map:

seems you missing NVreg_PreserveVideoMemoryAllocations=1 kernel module paramete

How did you know this?

its your boot kernel parameters

and i am not sure but you also missing nvidia_drm.modeset=1

Actually, I have it:

GRUB_DEFAULT='0'
GRUB_TIMEOUT='5'
GRUB_DISTRIBUTOR='EndeavourOS'
GRUB_CMDLINE_LINUX_DEFAULT="nowatchdog nvme_load=YES loglevel=3 nvidia-drm.modeset=1"
# GRUB_CMDLINE_LINUX_DEFAULT='nowatchdog nvme_load=YES loglevel=3 intel_pstate=disable'
GRUB_CMDLINE_LINUX=""

uh. you are using grub. i dont like it, its to complicated for me, but check what do you inside /etc/kernel/cmdline and try regenerate initramfs with sudo dracut --regenerate-all i think. please somebody correct me if i am wrong, i havent regenerate manually initramfs on eos ever.

Here’s what I’ve done so far:

  1. Created the file /etc/modprobe.d/nvidia-power-management.conf with the following:
 options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_TemporaryFilePath=/var/tmp
  1. Rebuilt initramfs:
sudo mkinitcpio -P
  1. Reboot
~ ❯ cat /proc/cmdline                        
BOOT_IMAGE=/boot/vmlinuz-linux-lts root=UUID=8ca92686-f8d0-47b2-80a1-5d4b5ebf7c2c rw nowatchdog nvidia-drm.modeset=1 nvme_load=YES loglevel=3
~ ❯ nvidia-smi                                                       
Wed Aug 14 17:21:04 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P2000                   Off | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P8              N/A / ERR! |     97MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       758      G   /usr/lib/Xorg                                95MiB |
+---------------------------------------------------------------------------------------+
 ❯ modinfo -F parm nvidia | grep NVreg
NVreg_ResmanDebugLevel: (int)
NVreg_RmLogonRC: (int)
NVreg_ModifyDeviceFiles: (int)
NVreg_DeviceFileUID: (int)
NVreg_DeviceFileGID: (int)
NVreg_DeviceFileMode: (int)
NVreg_InitializeSystemMemoryAllocations: (int)
NVreg_UsePageAttributeTable: (int)
NVreg_EnablePCIeGen3: (int)
NVreg_EnableMSI: (int)
NVreg_TCEBypassMode: (int)
NVreg_EnableStreamMemOPs: (int)
NVreg_RestrictProfilingToAdminUsers: (int)
NVreg_PreserveVideoMemoryAllocations: (int)
NVreg_EnableS0ixPowerManagement: (int)
NVreg_S0ixPowerManagementVideoMemoryThreshold: (int)
NVreg_DynamicPowerManagement: (int)
NVreg_DynamicPowerManagementVideoMemoryThreshold: (int)
NVreg_EnableGpuFirmware: (int)
NVreg_EnableGpuFirmwareLogs: (int)
NVreg_OpenRmEnableUnsupportedGpus: (int)
NVreg_EnableUserNUMAManagement: (int)
NVreg_MemoryPoolSize: (int)
NVreg_KMallocHeapMaxSize: (int)
NVreg_VMallocHeapMaxSize: (int)
NVreg_IgnoreMMIOCheck: (int)
NVreg_NvLinkDisable: (int)
NVreg_EnablePCIERelaxedOrderingMode: (int)
NVreg_RegisterPCIDriver: (int)
NVreg_EnableResizableBar: (int)
NVreg_EnableDbgBreakpoint: (int)
NVreg_RegistryDwords: (charp)
NVreg_RegistryDwordsPerDevice: (charp)
NVreg_RmMsg: (charp)
NVreg_GpuBlacklist: (charp)
NVreg_TemporaryFilePath: (charp)
NVreg_ExcludedGpus: (charp)
NVreg_DmaRemapPeerMmio: (int)
NVreg_RmNvlinkBandwidth: (charp)

How do I know that I have set it up the right way?

sudo cat /sys/module/nvidia_drm/parameters/modeset

should return Y

It returned Y.

Should I stick to the 535 driver for now and observe it for a few days or move to the latest?

latest are broken (every version from 550). more precisely not nvidia drivers are broken but something with kernel and systemd service. better wait for 560, maybe will be better. this is according to kernel panics. as for troubles with sleep/hibernation i dont know, i never managed it to work and i really havent tried hard bc i dont need this. personally on arch i am on nouveau and work quite well, on eos i use nvidia-open-dkms. open drivers becoming to be quite good, with 560 will completly openen i think.

Thanks. I’ll hang on a bit on 535 and hope what I did fixes the issue. Hopefully, this is my last comment on this thread.

@medwatt
If you using grub you can add NVreg_PreserveVideoMemoryAllocations=1 to /etc/default/grub and then run the update grub command. sudo grub-mkconfig -o /boot/grub/grub.cfg

Don’t know whether it will help but you could try .

@ricklinux, @Ultima_Thulee, laptop failed to wake from sleep today. Had to hard-reboot. So, all of the above didn’t work.

All you needed to look at was if the services are running.

systemctl status nvidia-suspend.service

systemctl status nvidia-hibernate.service

systemctl status nvidia-resume.service

Edit : I’m not sure why it didn’t wake from sleep but it may not be the nvidia gpu either that is the issue. :thinking: