I have very frequent (1-3 times a day) full system lockups (e.g. no input is registered any more, sound mostly continues though). I don’t know how to diagnose this.
These where generated after a reboot right after a freeze:
Installed Apps: http://ix.io/4FhV
gnome extensions: http://ix.io/4FfO
journalctl -b -1 -e: http://ix.io/4FhR
inxi -Fxz: http://ix.io/4FhS
vm.swappiness = 60
I have a suspicion that steam might be involved, otherwise I can’t see a pattern (yet).
Anything else that would be helpful?
What I can see is:
Sep 03 18:01:19 finn-endeavor gameoverlayui: g_object_unref: assertion ‘G_IS_OBJECT (object)’ failed
Sep 03 18:01:20 finn-endeavor pkexec: pam_unix(polkit-1:session): session opened for user root(uid=0) by finn(uid=1000)
Sep 03 18:01:20 finn-endeavor gamemoded: ERROR: Could not inspect tasks for client ! Skipping ioprio optimisation.
Sep 03 18:01:29 finn-endeavor steam.desktop: reaping pid: 17921 – gameoverlayui
Sep 03 18:02:16 finn-endeavor /usr/lib/gdm-x-session: (WW) NVIDIA(0): WAIT (2-S, 17, 0x0090, 0x00053bb0, 0x00053c68)
Sep 03 18:02:23 finn-endeavor /usr/lib/gdm-x-session: (WW) NVIDIA(0): WAIT (1-S, 17, 0x0090, 0x00053bb0, 0x00053c68)
Sep 03 18:02:25 finn-endeavor chromium.desktop: [16140:1:0903/180225.568483:ERROR:command_buffer_proxy_impl.cc(320)] GPU state invalid after WaitForGetOffsetInRange.
Sep 03 18:02:25 finn-endeavor chromium.desktop: [3207:3207:0903/180225.573528:ERROR:gpu_process_host.cc(956)] GPU process exited unexpectedly: exit_code=512
Sep 03 18:02:26 finn-endeavor /usr/lib/gdm-x-session: (WW) NVIDIA(0): WAIT (2-S, 17, 0x008f, 0x00053bb0, 0x00053c9c)
Graphics driver crash? It’s up to date…
That is about the same crash I’ve been having…in my case, the crash is induced by shutting down BOINC…I can watch a memory corruption start right afterward…The only way I’ve been able to recover is to reboot right after BOINC stops. The crash is related to unloading a high-usage program out of VRAM.
Yes, this is a driver issue…I’ve narrowed it down to the combination of the current Mesa & Vulcan builds…I’m waiting for Mesa 23.2 to come out…should be in a couple of weeks. You could downgrade Mesa & Vulcan to see if that fixes it…I’m moving to an Intel Arc video card…I’ve fought Nvidia for far too many years now & I’m not fond of AMD. There is a thread on what is going to happen to Nvidia in the 6.6 Kernel series & I don’t want to see it.
Take a look at this thread: Plans for Nvidia Proprietary Driver Going Forward from 6.6?
WTF what kind of unacceptable bug is that? Crash after every game? Also NVidia drivers will stop working on linux? Excuse me what the fuck?
I’m not a gamer but i suspect maybe it has something to do with game mode and pam?
Well—when I get a crash it’s not during games…it’s just when I shut off BOINC, which uses 95% of my available VRAM…the system will try to freeze. The message I get at that time is:
/usr/lib/gdm-x-session: (WW) NVIDIA: Wait for channel idle timed out.
At the same time, I can see my Conky Nvidia monitor will show the memory MHz at 0 & the GPU frequency goes to 32000 MHz…
This will happen VERY reliably when BOINC shuts off. This started happening with the current Mesa update.
I ran a 30 min stress-test in Windows to verify that I don’t have a problem with my RTX 3070…it passed without any errors.
That is what I thought…I just bought an Intel Arc A750…about time to leave Nvidia & I like the looks of the Arc series—they are just starting to optimize the driver to get good performance out of it.
Take a look: https://www.phoronix.com/review/intel-anv-mod-boost
I have a 3080 RTX FE, I will surely not “leave nvidia” any time soon. I will rather go back to windows.
Can you install xfce and try it again? Because for me its not happening there
Well…I’m not really interested in xfce…I’ve got a workaround that works now…and my mind is set on Arc now.
Since, in my case, BOINC causes the problem when I just stop the application with a high GPU load----I have found that stopping the GPU load—wait—then closing the application works. If I stop the GPU load—wait—then startup Steam—that also works. (BOINC has a selection to auto-stop all running instances if it sees a high load application start, but that seems to not work right—it used to).
I know that this won’t help in your instance—but that is what is working for me at this time.
Have you tried to downgrade Mesa to see if that works?