Kernel panic after resume from suspend (6.8.1.arch1-1)

For some time now, with the stable kernel, I have had complete system freezes (cannot switch to tty) after a seemingly random interval (30 sec-3 mins) after resuming from suspend. I hadn’t really investigated this until recently, with the 6.8.1.arch1-1 kernel, and it looks like a kernel panic is occurring. I’m unsure if the issues I were having before were the same as the ones I am having now, but regardless, I know that with the 6.8.1.arch1-1 version, these freezes are occurring because of a kernel panic. I am not having these issues with lts (6.6.22-1).

As I was writing this, I actually experienced the same kind of freeze with the lts kernel, except I don’t see any kernel panic in the logs, and it occurred long after resuming from suspend. I will also attach the log from that boot, but now I’m less convinced it is an issue with the kernel.

My system is a dual gpu setup with an integrated AMD chip and a discrete NVIDIA chip. I use optimus-manager for PRIME support. In all cases, I have been testing with hybrid mode. I have enabled nvidia-suspend.service and nvidia-hibernate.service, but I noticed that I had not added the NVreg_PreserveVideoMemoryAllocations=1 kernel parameter. If needed, I can test with that parameter as well. However, I would be surprised if that is the issue, as the freeze occurs regardless of whether anything is running through PRIME, and the issue does not occur on the LTS kernel.

I had considered posting this on the Arch forums, as I believe it is a kernel problem, but they don’t seem very friendly to posters from Arch-derivatives. I don’t know if this is an appropriate place to report kernel bugs.

Here is my neofetch:


                 `:osssssss+-              OS: EndeavourOS Linux x86_64 
               `:+sssssssssso/.            Host: Victus by HP Laptop 16-e0xxx 
             `-/ossssssssssssso/.          Kernel: 6.6.22-1-lts 
           `-/+sssssssssssssssso+:`        Uptime: 19 hours, 25 mins 
         `-:/+sssssssssssssssssso+/.       Packages: 1945 (pacman), 13 (flatpak) 
       `.://osssssssssssssssssssso++-      Shell: bash 5.2.26 
      .://+ssssssssssssssssssssssso++:     Resolution: 1920x1080 
    .:///ossssssssssssssssssssssssso++:    DE: Plasma 6.0.2 
  `:////ssssssssssssssssssssssssssso+++.   WM: KWin 
`-////+ssssssssssssssssssssssssssso++++-   Theme: [Plasma] 
 `..-+oosssssssssssssssssssssssso+++++/`   Icons: [Plasma], breeze-dark [GTK2/3] 
   ./++++++++++++++++++++++++++++++/:.     Terminal: konsole 
  `:::::::::::::::::::::::::------``       CPU: AMD Ryzen 5 5600H with Radeon Graphics (12) @ 4.280GHz 
                                           GPU: AMD ATI Radeon Vega Series / Radeon Vega Mobile Series 
                                           GPU: NVIDIA GeForce RTX 3050 Mobile 
                                           Memory: 6367MiB / 7277MiB 
                                           Battery0: 99% [Not charging] 
                                           Active GPU: hybrid 

Here is a journal from one of the boots where the issue occurred:
https://0x0.st/XsZ2.txt

Here is the journal from the boot where a freeze occurred with the lts kernel:
https://0x0.st/XsZ_.txt

1 Like

Just now, I got a kernel panic with the stable kernel. Below is the log, where the kernel panic actually appears, unlike the previous freeze with stable.
http://dpaste.com//26H9EYYNU

I’m getting kernel panic on boot with the 6.8.1 and 6.8.2 kernel too (default and zen). I’m holding the 6.7.9 that runs without problem here.
I’m too with a hybrid gpu setup. The same NVIDIA GeForce RTX 3050 here… Maybe tha cause?
I reseted the envycontrol configuration but the problem still occurs.


                  ./sssso-                ------------------------- 
                `:osssssss+-              OS: EndeavourOS Linux x86_64 
              `:+sssssssssso/.            Host: Dell G15 5530 
            `-/ossssssssssssso/.          Kernel: 6.7.9-zen1-1-zen 
          `-/+sssssssssssssssso+:`        Uptime: 8 mins 
        `-:/+sssssssssssssssssso+/.       Packages: 2488 (pacman), 16 (flatpak) 
      `.://osssssssssssssssssssso++-      Shell: bash 5.2.26 
     .://+ssssssssssssssssssssssso++:     Resolution: 1920x1080 
   .:///ossssssssssssssssssssssssso++:    DE: Plasma 6.0.2 
 `:////ssssssssssssssssssssssssssso+++.   WM: kwin 
`-////+ssssssssssssssssssssssssssso++++-   Theme: [Plasma], Breeze [GTK2/3] 
`..-+oosssssssssssssssssssssssso+++++/`   Icons: breeze [Plasma], breeze [GTK2/3] 
  ./++++++++++++++++++++++++++++++/:.     Terminal: konsole 
 `:::::::::::::::::::::::::------``       CPU: 13th Gen Intel i5-13450HX (16) @ 4.600GHz 
                                          GPU: NVIDIA GeForce RTX 3050 6GB Laptop GPU 
                                          GPU: Intel Raptor Lake-S UHD Graphics 
                                          Memory: 2535MiB / 15671MiB 

                                                                  
                                                                  

6.7.9 works for you without freezes?

How do you manage render offloading? What’s the wiring setup of the laptop?

Also, I just got another strange freeze, but I’m unsure if it’s related, as it didn’t seem like a kernel panic. Like usual, it happened some time after resuming from suspend. The screen stopped responding, and my keyboard did nothing, but my mouse could move. My cursor was moving very strangely, though. Audio still played, and I couldn’t switch to tty. After switching to touchpad, my cursor moved normally. I tried closing the lid and opening, but then my cursor was completely frozen. Audio was still playing.

Here are the logs. It seems like the issue was not a kernel panic.
http://dpaste.com//77XJDNN6R

Ive had similar freezes on KDE and XFCE recently. Also have Intel + Nvidia hybrid setup with an RTX 3050. I have had only one kernel panic but the freezes have been numerous and occur at any time not just a few minutes after resuming from sleep. I thought it was a KDE issue as almost all of the freezes occurred in game(CS2, GTAV, Apex Legends) but now I also get occasional freezes on XFCE. Audio keeps playing for a while but eventually stops and the laptop remains unresponsive. The only solution appears to be rebooting with magic sysrq. I have tried Nvidia 545, 535, and the current vulkan beta drivers and not noticed a difference

Do you have any journals you can provide where the freezes occurred? We all seem to have similar setups. Perhaps cross posting on the NVIDIA dev forums would be helpful…

not at the moment unfortunately. i will make sure to check and share journal next time it happens

Here is another journal where a freeze occurred. This time, my screen suddenly went dark, with the cursor still showing (maybe from inactivity?). Then, I closed the lid, reopened it, and then I got a kernel panic (blinking caps key), but the panic didn’t show up the journal.

http://dpaste.com//E25QNEGVY

The really bizarre thing about these freezes is how diverse the appearances/symptoms of the freezes are. Each time, the freeze seems to happen in slightly different ways.

I decided to cross post this on NVIDIA dev forums, and discovered someone else there was having the same issue in July of last year. Here is the post

i cant find anything in the logs either. hopefully the nvidia forums provide an answer

As this is the second search result on Google for the term “kernel panic after resume nvidia”, I’m adding information here even though I’m on NixOS:

  • RTX3070
  • Ryzen 5 3500X (No integrated GPU)
  • Kernel 6.1.82
  • Nvidia proprietary driver 545.29.06 / 535.154.05
  • KDE Plasma 6 (Wayland / Xorg)

The machine freezes within a minute or so after every resume. When it freezes, switching to tty does not work, and SSH-ing also does not work. Most of the time it is not logged in journalctl --boot=-1, but sometimes there’s a suspicious kernel NULL pointer dereference in the log, pasted verbatim below:

journalctl output log

“rtx3070” is the hostname of the machine.

Apr 04 21:13:23 rtx3070 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000032
Apr 04 21:13:23 rtx3070 kernel: #PF: supervisor instruction fetch in kernel mode
Apr 04 21:13:23 rtx3070 kernel: #PF: error_code(0x0010) - not-present page
Apr 04 21:13:23 rtx3070 kernel: PGD 1ccd36067 P4D 1ccd36067 PUD 0 
Apr 04 21:13:23 rtx3070 kernel: Oops: 0010 [#2] PREEMPT SMP NOPTI
Apr 04 21:13:23 rtx3070 kernel: CPU: 0 PID: 890 Comm: irq/72-nvidia Tainted: P      D    O       6.1.82 #1-NixOS
Apr 04 21:13:23 rtx3070 kernel: Hardware name: Gigabyte Technology Co., Ltd. B550M DS3H/B550M DS3H, BIOS F13h 04/23/2021
Apr 04 21:13:23 rtx3070 kernel: RIP: 0010:0x32
Apr 04 21:13:23 rtx3070 kernel: Code: Unable to access opcode bytes at 0x8.
Apr 04 21:13:23 rtx3070 kernel: RSP: 0018:ffff994d42287eb0 EFLAGS: 00010286
Apr 04 21:13:23 rtx3070 kernel: RAX: 0000000000000032 RBX: ffffffff8249b5a7 RCX: 00000000000001b0
Apr 04 21:13:23 rtx3070 kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff994d42287ed0
Apr 04 21:13:23 rtx3070 kernel: RBP: ffff9240c82d5780 R08: 00000000000e7ef0 R09: ffff994d42287de8
Apr 04 21:13:23 rtx3070 kernel: R10: 00000000000001d9 R11: ffffffff83f3b9c8 R12: ffff9240c82d60a4
Apr 04 21:13:23 rtx3070 kernel: R13: ffff9240c82dd201 R14: 0000000000000000 R15: 0000000000000000
Apr 04 21:13:23 rtx3070 kernel: FS:  0000000000000000(0000) GS:ffff9245e6a00000(0000) knlGS:0000000000000000
Apr 04 21:13:23 rtx3070 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 04 21:13:23 rtx3070 kernel: CR2: 0000000000000032 CR3: 0000000124130000 CR4: 0000000000350ef0
Apr 04 21:13:23 rtx3070 kernel: Call Trace:
Apr 04 21:13:23 rtx3070 kernel:  <TASK>
Apr 04 21:13:23 rtx3070 kernel:  ? __die_body.cold+0x1a/0x1f
Apr 04 21:13:23 rtx3070 kernel:  ? page_fault_oops+0x15a/0x2d0
Apr 04 21:13:23 rtx3070 kernel:  ? exc_page_fault+0x6a/0x150
Apr 04 21:13:23 rtx3070 kernel:  ? asm_exc_page_fault+0x22/0x30
Apr 04 21:13:23 rtx3070 kernel:  ? do_exit+0x357/0xac0
Apr 04 21:13:23 rtx3070 kernel:  ? task_work_run+0x59/0x90
Apr 04 21:13:23 rtx3070 kernel:  ? do_exit+0x357/0xac0
Apr 04 21:13:23 rtx3070 kernel:  ? make_task_dead+0x8d/0x90
Apr 04 21:13:23 rtx3070 kernel:  ? rewind_stack_and_make_dead+0x17/0x20
Apr 04 21:13:23 rtx3070 kernel:  </TASK>
Apr 04 21:13:23 rtx3070 kernel: Modules linked in: qrtr snd_seq_dummy snd_hrtimer snd_seq snd_seq_device af_packet bnep sch_fq_codel nv>
Apr 04 21:13:23 rtx3070 kernel:  stp llc kvm_amd ccp rng_core kvm irqbypass v4l2loopback(O) videodev mc fuse deflate efi_pstore configf>
Apr 04 21:13:23 rtx3070 kernel: CR2: 0000000000000032
Apr 04 21:13:23 rtx3070 kernel: ---[ end trace 0000000000000000 ]---
Apr 04 21:13:23 rtx3070 kernel: RIP: 0010:irq_thread+0xc3/0x1c0
Apr 04 21:13:23 rtx3070 kernel: Code: 02 00 48 89 df e8 6d 38 fa ff 4c 89 ee 4c 89 e7 e8 92 fe ff ff eb 0e f0 48 0f ba 75 00 00 72 23 e>
Apr 04 21:13:23 rtx3070 kernel: RSP: 0018:ffff994d42287ec0 EFLAGS: 00010246
Apr 04 21:13:23 rtx3070 kernel: RAX: 0000000000000001 RBX: 000000000000374f RCX: 0000000000000002
Apr 04 21:13:23 rtx3070 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9245e6a32880
Apr 04 21:13:23 rtx3070 kernel: RBP: ffff9240cf102cc0 R08: 0000000000000001 R09: 0000000000000000
Apr 04 21:13:23 rtx3070 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9240d447b400
Apr 04 21:13:23 rtx3070 kernel: R13: ffff9240cf102c80 R14: ffffffff825188d0 R15: ffff9240d447b558
Apr 04 21:13:23 rtx3070 kernel: FS:  0000000000000000(0000) GS:ffff9245e6a00000(0000) knlGS:0000000000000000
Apr 04 21:13:23 rtx3070 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 04 21:13:23 rtx3070 kernel: CR2: 0000000000000032 CR3: 0000000124130000 CR4: 0000000000350ef0
Apr 04 21:13:23 rtx3070 kernel: note: irq/72-nvidia[890] exited with irqs disabled

The kernel panic bug does not seem to be related to Plasma 6 itself, as the kernel panic can be triggered under all these environments:

  • Plasma 6, both Wayland and Xorg
  • Wayfire
  • Suspending from the sddm UI without any DE running

Before downgrading Nvidia drivers to 535/545, I was on 550 and was experiencing another similar bug (random freezes) described in this thread by other people on the Nvidia dev forums. I cannot recall whether the trigger was resuming from suspend, but in the thread there are several mentions that suspending helps triggering the freeze. I am not sure whether these bugs on 545 and 550 are the same one.

Do you have the 3050?

I have a 3070.