There are numerous bug reports and even more possibly-related forum posts and anecdotes floating around about a surprisingly common problem with amdgpu systems.
Symptoms include an apparent ‘freeze’ - though this is often actually just the gfx ceasing to be updated, while audio continues if playing, and a TTY is accessible with patience.
Often triggered by h.264, fullscreen videos in browsers like firefox, or fullscreen games.
Output in the logs would appear like some of the following;
Pageflip timed out! This is a bug in the amdgpu kernel driver
Or
flip_done timed out
Kernel
First it may be mentioned that kernel 6.6 ( linux66
) is often comparatively reliable without further settings but this is not a permanent solution, nor is it even an option for some systems.
PSR
It is quite possible that these issues are related to PSR, “Panel Self Refresh”, a power saving feature. The amdgpu.dcdebugmask
parameter can be used to affect PSR options derived from these values;
enum DC_DEBUG_MASK {
DC_DISABLE_PIPE_SPLIT = 0x1,
DC_DISABLE_STUTTER = 0x2,
DC_DISABLE_DSC = 0x4,
DC_DISABLE_CLOCK_GATING = 0x8,
DC_DISABLE_PSR = 0x10,
DC_FORCE_SUBVP_MCLK_SWITCH = 0x20,
DC_DISABLE_MPO = 0x40,
DC_ENABLE_DPIA_TRACE = 0x80,
DC_ENABLE_DML2 = 0x100,
DC_DISABLE_PSR_SU = 0x200,
DC_DISABLE_REPLAY = 0x400,
DC_DISABLE_IPS = 0x800,
(some definitions: Core Driver Infrastructure — The Linux Kernel documentation)
Meaning something like these:
DC_DISABLE_PSR_SU
amdgpu.dcdebugmask=0x200
OR
DC_DISABLE_PSR_SU and DC_DISABLE_REPLAY
amdgpu.dcdebugmask=0x600
OR
DC_DISABLE_PSR (automatically also SU)
amdgpu.dcdebugmask=0x10
OR
DC_DISABLE_PSR (automatically also SU) & DC_DISABLE_STUTTER
amdgpu.dcdebugmask=0x12
In roughly increasing order of severity may be enough to workaround the issue.
dGPU Power Management
Users of dedicated GPUs may find that the following option is useful, again at the cost of power-saving.
amdgpu.runpm=0
Direct Scan Out
Finally, especially for integrated GPU users where none of the above successfully abated the amdgpu freeze, another possible ‘fix’ is to disable “Direct Scan-out”. This is not optimal as direct scanout is meant to increase performance and decrease latency. However this is the only thing that worked for me and is preferable to a full graphics lockup. This can be controlled via a global environment variable (such as set in /etc/environment
) and varies with window manager.
For kwin
:
KWIN_DRM_NO_DIRECT_SCANOUT=1
For other wayland compositors:
WLR_SCENE_DISABLE_DIRECT_SCANOUT=1
Hope that was helpful to someone else out there. <3