After 4 or 5 months having this random crashes, I need help solving the issue. In February when I tried out CachyOS for a week I got random crashes, which persistent when I came back to EOS.
Today it crashed 4 times already, one when I was opening a window, the other when i switched icons on the taskbar, one time when the lock screen appeared and i was not doing anything at all, so completely random.
The output of journalctl which is always the same or it is nothing logged at all:
Apr 20 15:04:05 saeniv-b660mprors systemd[850]: Started app-org.kde.konsole-8315.scope.
Apr 20 15:04:13 saeniv-b660mprors systemd[850]: Started Dolphin - File Manager.
Apr 20 15:04:23 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:23 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:23 saeniv-b660mprors kwin_wayland[890]: kwin_libinput: Libinput: event3 - Sharkoon Technologies GmbH Sharkoon Light² 100: client bug: event processing lagging behind by 334ms, your system is >
Apr 20 15:04:23 saeniv-b660mprors kwin_wayland[890]: kwin_libinput: Libinput: client bug: timer event3 debounce: scheduled expiry is in the past (-250ms), your system is too slow
Apr 20 15:04:23 saeniv-b660mprors kwin_wayland[890]: kwin_libinput: Libinput: client bug: timer event3 debounce short: scheduled expiry is in the past (-263ms), your system is too slow
Apr 20 15:04:23 saeniv-b660mprors kwin_wayland[890]: kwin_libinput: Libinput: WARNING: log rate limit exceeded (5 msgs per 3600000ms). Discarding future messages.
Apr 20 15:04:23 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:23 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:24 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:24 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:24 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:24 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:24 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:25 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:25 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:25 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:25 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:25 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:25 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Apr 20 15:04:33 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Apr 20 15:04:36 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Apr 20 15:04:36 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
Apr 20 15:04:37 saeniv-b660mprors kernel: snd_hda_intel 0000:03:00.1: Unable to change power state from D3hot to D0, device inaccessible
Apr 20 15:04:41 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
Apr 20 15:04:41 saeniv-b660mprors kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
-- Boot f075a9441e0a479fb8a2b4b80ac890a4 --
I updated the bios, i tried different gpu drivers, switched from the Noname NVME to a Samsung SSD, tried different mesa. I tried everything i found.
Maybe switching from KDE to another desktop environment. I am really frustrated, because the PC is barely usable for daily work, during a game I have no crashes, which is odd.
Thank you very much. I googled so much and read so many threads, but I was searching for the wrong topic. Sorry for creating a new post.
I applied the fix and test it out.
The same question was also already answered here:
I think so…it’s preventing it from low power at idle state or something like that? I’m hoping it works as I am doing a lot of guessing because it’s a bit of trial & error method when one doesn’t really know 100% what is causing the problem. A stab in the dark as they say!
So I was not at home yesterday, but today i had one crash, with a different report, but still a crash. I hope it was just a one time thing, but i will see tomorrow, where i will use the PC more.
Hello,
Today I have used the PC for a whole day, and it worked without issues.
amdgpu.ppfeaturemask=0xfff73fff
Works great. Thank you. The reduced version with:
amdgpu.ppfeaturemask=0xffff7fff
Which should just disable PP_GXOFF_MASK did not work for me. So thank you very much for your advice. I will add it as a permanent kernel parameter to my system.
One question still crosses my mind, why is my system not working without the parameter?
I will leave the thread open for another two weeks just to be certain.
It’s likely a UEFI Firmware Bios issue. Not all manufacturers do a great job to implement this. A Bios update may eventually fix this or a kernel update. There is nothing wrong with using kernel parameters to deal with these kinds of issues.