System freeze with driver AMD ATI Radeon RX 6800/6800 XT / 6900 XT on wayland

Hi Guys,

i got a problem i cannot understand, i have make the usual update some days ago, and from that day my system going crazy. I Got system freeze with driver AMD ATI Radeon RX 6800/6800 XT / 6900 XT on wayland, now i’m switch on X11 and the system work very badly, but i don’t have anymore freezing (for now).

Some more info for you:

lspci | grep -i vga
2d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c3)

glxinfo | grep "OpenGL vendor\|OpenGL renderer\|OpenGL version"
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon RX 6800 (radeonsi, navi21, LLVM 19.1.7, DRM 3.61, 6.14.6-arch1-1)
OpenGL version string: 4.6 (Compatibility Profile) Mesa 25.1.1-arch1.1

And that is my system:

Kernel: 6.14.6-zen1-1-zen 
DE: Plasma 6.3.5 
CPU: AMD Ryzen 9 5900X (24) @ 4.954GHz 
GPU: AMD ATI Radeon RX 6800/6800 XT / 6900 XT 
Memory: 4864MiB / 32008MiB 
OS: EndeavourOS Linux x86_64

Some ideas?

Thanks!

Maybe issue with newest mesa?

Hi Thanks for the reply! I thought it was a problem only for nVidia systems… so do I need to do a rollback?

Ok, you have already localized the problem. Nobody knows what the problem is yet. It looks like Nvidia CPU’s are affected. But AMD can also be affected

I don’t know what you should do. A downgrade may or may not be an improvement

Before you do anything, you’d better wait and see if someone who knows more about it than I do reads about it

If you try first just downgrade mesa.

Edit. A little late…

Maybe the problem isn’t with the mesa then.

Ok i will try to downgrade mesa, and see what happend

Ok, with the downgrade it’s even worse, now the system has a black screen on the SSD. I booted the machine with an old snapshot and now it works… I really don’t know what the problem is.

1 Like

with the driver mesa 25.0.5 and Kernel: 6.14.4-zen1-2-zen i got no problem for now.. maybe i will wait for a while before update the system, until someone can figureout what is the problem…

1 Like

I have now updated the whole system, because I saw a new version of the kernel, so I did a test,

This is the version I have now:

lspci | grep -i vga
2d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c3)

OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon RX 6800 (radeonsi, navi21, LLVM 19.1.7, DRM 3.61, 6.14.7-zen2-1-zen)
OpenGL version string: 4.6 (Compatibility Profile) Mesa 25.1.1-arch1.1

this is the result:

journalctl -p 3 -b -1 --no-pager
mag 25 10:27:26 Linux systemd-modules-load[297]: Failed to find module 'v4l2loopback_dc'
mag 25 10:27:29 Linux systemd-modules-load[780]: Failed to find module 'v4l2loopback_dc'
mag 25 10:27:29 Linux systemd-udevd[819]: /usr/lib/udev/rules.d/39-libtvcontrol.rules:4 Unknown group 'plugdev', ignoring.
mag 25 10:27:29 Linux systemd-udevd[819]: /usr/lib/udev/rules.d/39-libtvcontrol.rules:4 Invalid value "/bin/sh -c 'for I in /sys/bus/usb/drivers/cytherm/$kernel*; do echo `basename $I` > /sys/bus/usb/drivers/cytherm/unbind ;  done'" for RUN (char 79: invalid substitution type), ignoring.
mag 25 10:27:31 Linux grub-btrfsd[1296]: [!] inotifywait was not found, exiting. Is inotify-tools installed?
mag 25 10:27:31 Linux bluetoothd[1281]: profiles/input/manager.c:load_config_file() Parsing /etc/bluetooth/input.conf failed: Key file does not start with a group
mag 25 10:27:31 Linux bluetoothd[1281]: profiles/audio/avctp.c:avctp_server_socket() setsockopt(L2CAP_OPTIONS): Invalid argument (22)
mag 25 10:27:32 Linux libvirtd[1415]: cannot open directory '/run/media/simone/6FCB-7CC2': File o directory non esistente
mag 25 10:27:32 Linux libvirtd[1415]: internal error: Failed to autostart storage pool '6FCB-7CC2': cannot open directory '/run/media/simone/6FCB-7CC2': File o directory non esistente
mag 25 10:27:33 Linux kernel: i2c i2c-3: adapter quirk: no zero length (addr 0x0018, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-3: adapter quirk: no zero length (addr 0x0019, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-3: adapter quirk: no zero length (addr 0x001a, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-3: adapter quirk: no zero length (addr 0x001b, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-3: adapter quirk: no zero length (addr 0x001c, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-3: adapter quirk: no zero length (addr 0x001d, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-3: adapter quirk: no zero length (addr 0x001e, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-3: adapter quirk: no zero length (addr 0x001f, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-4: adapter quirk: no zero length (addr 0x0018, size 0, write)
mag 25 10:27:33 Linux kernel: i2c i2c-4: adapter quirk: no zero length (addr 0x0019, size 0, write)
mag 25 10:27:35 Linux dbus-broker-launch[1652]: Ignoring duplicate name 'org.freedesktop.Notifications' in service file '/usr/share//dbus-1/services/org.xfce.xfce4-notifyd.Notifications.service'
mag 25 10:27:40 Linux dbus-broker-launch[1766]: Ignoring duplicate name 'org.freedesktop.Notifications' in service file '/usr/share//dbus-1/services/org.xfce.xfce4-notifyd.Notifications.service'
mag 25 10:27:42 Linux plasmashell[2010]: qml: RESTORE LAST POSITION: 3800
mag 25 10:27:49 Linux bluetoothd[1281]: src/profile.c:record_cb() Unable to get Hands-Free Voice gateway SDP record: Host is down
mag 25 10:28:49 Linux bluetoothd[1281]: src/profile.c:record_cb() Unable to get Hands-Free Voice gateway SDP record: Host is down
mag 25 10:29:49 Linux bluetoothd[1281]: src/profile.c:record_cb() Unable to get Hands-Free Voice gateway SDP record: Host is down
mag 25 10:30:43 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
mag 25 10:30:43 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
mag 25 10:30:43 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:43 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:43 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:43 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:43 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:43 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Error sending STATISTICS_CMD: time out after 2000ms.
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Current CMD queue read_ptr 161 write_ptr 162
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Start IWL Error Log Dump:
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Transport status: 0x0000004A, valid: 6
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Loaded firmware version: 77.f31a351f.0 cc-a0-77.ucode
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN       
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0000A200 | trm_hw_status0
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000000 | trm_hw_status1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x004F8D82 | branchlink2
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x004E398C | interruptlink1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x004E398C | interruptlink2
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00015332 | data1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x01000000 | data2
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000000 | data3
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0xCBC03609 | beacon time
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x393C99F7 | tsf low
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0000057D | tsf hi
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000000 | time gp1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0B6FFF08 | time gp2
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000001 | uCode revision type
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0000004D | uCode version major
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0xF31A351F | uCode version minor
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000340 | hw version
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00C89000 | board version
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x8041FC28 | hcmd
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x24020000 | isr0
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00400000 | isr1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x08F8400A | isr2
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x04C3200C | isr3
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000002 | isr4
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x03BB001C | last cmd Id
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00015332 | wait_event
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x000000D4 | l2p_control
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00018014 | l2p_duration
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000007 | l2p_mhvalid
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000081 | l2p_addr_match
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000008 | lmpm_pmg_sel
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000000 | timestamp
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x000058A8 | flow_handler
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Start IWL Error Log Dump:
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Transport status: 0x0000004A, valid: 7
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x20000066 | NMI_INTERRUPT_HOST
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000000 | umac branchlink1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x80455D7A | umac branchlink2
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0xC0081614 | umac interruptlink1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0xC00807D0 | umac interruptlink2
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x01000000 | umac data1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0xC00807D0 | umac data2
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000000 | umac data3
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0000004D | umac major
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0xF31A351F | umac minor
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0B6FFF06 | frame pointer
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0xC0887E90 | stack pointer
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00A1019C | last host cmd
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00110029 | isr status reg
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: IML/ROM dump:
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000003 | IML/ROM error/state
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0000581F | IML/ROM data1
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Fseq Registers:
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x60000000 | FSEQ_ERROR_CODE
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x80290021 | FSEQ_TOP_INIT_VERSION
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00050008 | FSEQ_CNVIO_INIT_VERSION
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0000A503 | FSEQ_OTP_VERSION
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x80000003 | FSEQ_TOP_CONTENT_VERSION
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x4552414E | FSEQ_ALIVE_TOKEN
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00100530 | FSEQ_CNVI_ID
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000532 | FSEQ_CNVR_ID
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00100530 | CNVI_AUX_MISC_CHIP
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00000532 | CNVR_AUX_MISC_CHIP
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x05B0905B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x0000025B | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00050008 | FSEQ_PREV_CNVIO_INIT_VERSION
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00290021 | FSEQ_WIFI_FSEQ_VERSION
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x00290021 | FSEQ_BT_FSEQ_VERSION
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: 0x000000F0 | FSEQ_CLASS_TP_VERSION
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: UMAC CURRENT PC: 0x80472b1c
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: LMAC1 CURRENT PC: 0xd0
mag 25 10:30:44 Linux kernel: iwlwifi 0000:29:00.0: Device error - SW reset
mag 25 10:30:48 Linux kernel: snd_hda_intel 0000:2d:00.1: Unable to change power state from D3hot to D0, device inaccessible
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
mag 25 10:30:48 Linux kernel: snd_hda_intel 0000:2d:00.1: CORB reset timeout#2, CORBRP = 65535
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:48 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kwin_wayland[1780]: kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver
mag 25 10:30:49 Linux kwin_wayland[1780]: kwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues
mag 25 10:30:49 Linux kwin_wayland[1780]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:49 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:50 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:51 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:51 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:51 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:51 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:51 Linux kwin_wayland[1780]: kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver
mag 25 10:30:51 Linux kwin_wayland[1780]: kwin_wayland_drm: Please report this at https://gitlab.freedesktop.org/drm/amd/-/issues
mag 25 10:30:51 Linux kwin_wayland[1780]: kwin_wayland_drm: With the output of 'sudo dmesg' and 'journalctl --user-unit plasma-kwin_wayland --boot 0'
mag 25 10:30:51 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:51 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!
mag 25 10:30:51 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is in hanged state, failed to send smu message!


After this crash I went back to X11 (because I don’t want to use a snapshot again, for now on x11 everything works fine)

@1sn0m3 I am also facing some screen freeze issue in my system:

Kde plasma login screen freeze issue.

If possible can you please check if it’s the same that you faced and how you fixed it.

Sorry but I only have an AMD Gpu, so I don’t think the problem is the same. Anyway for now I solved it temporarily by changing the vulkan-radeon driver to amdvlk . I don’t know if you have already tested with different drivers from nvidia, but for now I don’t have any other solution..

I’m here with Nvidia, and I’m facing similiar constant plasmashell crashes, like daily 2-3 times, my plasma just crashes out of nowhere, panel disappears, system freezes, then it relaunches itself (not using any custom widgets or anything like that, just to clarify) - Plasma got used to work for me for a long time, until recently, I guess like 2weeks ago, there was a major Plasma update or something, and then things started falling apart. So you’re not alone.

edit: and the most annoying thing, I can’t traceback what is causing the crash, cause no logger on earth not even drkonqi records any crashdump, so it’s gonna be glhf for fixing it by team KDE :smiley:

I dont know about all the other errors .. but this one is probably related to PSR.

You may try a boot option;

amdgpu.dcdebugmask=0x12

Or

amdgpu.dcdebugmask=0x412

And/or

amdgpu.runpm=0

(This last one is for dGPUs and can be combined with the PSR-disabling options above.)

The first 2 options above may not be optimal, and there are other configurations of the option.

Here I copy part of a different post from last October outlining all the things;

It’s probably related to one of these bugs.

It started happening around the beginning of the year, where you gpu would reset/hang once or twice a day. I tried all the workarounds mentioned in the issues(even the ones @cscs mentioned here) but no workaround fixed the issue, after 4 months I ended switching back to an Nvidia gpu because it made my system too unpredictable and it was that annoying.

There are literally dozens of bugs filed related to this issue;

Some are dGPU only, while some point to kernel 6.6, while mine didnt start happening until about ~6.10 or so .. again some time last year. And I can reliably use the 6.6 kernel without any issue.

In some cases disabling PSR worked. In some others some other combination helped but didnt completely end the freezes. In some other cases folks uses a different/new ddcutil package.

I was just trying to help point out some of the common/easy workarounds.

( For myself none had worked .. but currently the amdgpu.dcdebugmask=0x412 variant has made 6.15 kernel stable for a whole day .. still observing. EDIT. Ha, minutes after posting this had to do the TTY>Restart dance. :crying_cat: )

1 Like

:open_mouth: Didn’t know there were that many, when I went specifically searching when I ran into this issue I came across these two. I’m thinking it’s a kernel regressions because the two I linked are one from kernel 6.6 and the other from 6.12 and it started happening for me around the beginning of the year or end of last year.

I know, so was I. I wasn’t try to come off as I know better just sharing my experience of this bug I had with my RX 7900XTX.

:sob: I tried that one too, that sucks. I switched back to an Nvidia gpu around the beginning of April because after word 4 months and it not being fixed I just couldn’t deal with it anymore.

The odd thing is I know someone who has the exact same gpu who’s not having the issue. Only differences is that person has an AM4 system and I have an AM5 system. I try to look for commonalities but eventually gave up.

I haven’t looked at Nvidia bugs, but I do have to say it looks like a cesspool of bugs and I’ve never had an Nvidia gpu that caused my system to become this unpredictable. I hope they find the problem soon as well as a fix for everyone that is running into this issue.

So .. I found something that worked for me and outlined all the stuff at a bigger post.

1 Like