Game Crashes - RX 6800 XT - amdgpu: [gfxhub] page fault

I’ve been experiencing crashes with select titles, currently Sons of the Forest and Monster Hunter Wilds. I do note that Monster Hunter is currently unstable.

Running stress tests, the system remains stable.

System information: https://0x0.st/8MOv.txt

Boot: https://0x0.st/8MOx.txt

The journal output:

Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32801)
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process MonsterHunterWi pid 5819 thread vkd3d_queue pid 5970
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x00008000e4400000 from client 0x1b (UTCL2)
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501430
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:53:49 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:00 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=1046646, emitted seq=1046648
Mar 03 14:54:00 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: Process information: process MonsterHunterWi pid 5819 thread vkd3d_queue pid 5873
Mar 03 14:54:00 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Mar 03 14:54:00 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: Ring gfx_0.0.0 reset failure
Mar 03 14:54:01 McTherodin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Mar 03 14:54:01 McTherodin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Mar 03 14:54:01 McTherodin systemd-coredump[6239]: [🡕] Process 1631 (Xwayland) of user 1000 dumped core.

                                                   Stack trace of thread 1638:
                                                   #0  0x000073ea8bf055e7 n/a (n/a + 0x0)
                                                   #1  0x000073ea893e20e3 n/a (n/a + 0x0)
                                                   #2  0x000073ea893e55b3 n/a (n/a + 0x0)
                                                   #3  0x000073ea88edd8a4 n/a (n/a + 0x0)
                                                   #4  0x000073ea88f1271d n/a (n/a + 0x0)
                                                   #5  0x000073ea8bf7570a n/a (n/a + 0x0)
                                                   #6  0x000073ea8bff9aac n/a (n/a + 0x0)
                                                   ELF object binary architecture: AMD x86-64
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301430
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:01 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:02 McTherodin systemd-coredump[6237]: [🡕] Process 2091 (corectrl) of user 1000 dumped core.

                                                   Stack trace of thread 2508:
                                                   #0  0x000074a335a335e7 n/a (n/a + 0x0)
                                                   #1  0x000074a319fe20e3 n/a (n/a + 0x0)
                                                   #2  0x000074a319fe55b3 n/a (n/a + 0x0)
                                                   #3  0x000074a319add8a4 n/a (n/a + 0x0)
                                                   #4  0x000074a319b1271d n/a (n/a + 0x0)
                                                   #5  0x000074a335aa370a n/a (n/a + 0x0)
                                                   #6  0x000074a335b27aac n/a (n/a + 0x0)
                                                   ELF object binary architecture: AMD x86-64
Mar 03 14:54:04 McTherodin systemd-coredump[6240]: [🡕] Process 1778 (plasmashell) of user 1000 dumped core.
Mar 03 14:54:04 McTherodin systemd-coredump[6272]: [🡕] Process 3994 (Discord) of user 1000 dumped core.

                                                   Module discord_zstd.node without build-id.
                                                   Module notificationstate.node without build-id.
                                                   Stack trace of thread 171:
                                                   #0  0x00007d36152fedb4 n/a (n/a + 0x0)
                                                   #1  0x00007d36152a608e n/a (n/a + 0x0)
                                                   #2  0x00007d361528d882 n/a (n/a + 0x0)
                                                   #3  0x00007d358bfc7ca3 _ZSt11__terminatePFvvE (discord_utils.node + 0xbdca3)
                                                   #4  0x00007d358bfc7696 _ZN10__cxxabiv1L12failed_throwEPNS_15__cxa_exceptionE (discord_utils.node + 0xbd696)
                                                   #5  0x00007d358bfc762f __cxa_throw (discord_utils.node + 0xbd62f)
                                                   #6  0x00007d358bfc50eb _ZNSt4__Cr20__throw_system_errorEiPKc (discord_utils.node + 0xbb0eb)
                                                   #7  0x00007d358bfc51a3 _ZNSt4__Cr6thread4joinEv (discord_utils.node + 0xbb1a3)
                                                   #8  0x00007d358bf892b8 _ZN7discord2uv17ThreadedEventLoop8ShutdownEv (discord_utils.node + 0x7f2b8)
                                                   #9  0x00007d358bf20db7 _ZNSt4__Cr14__shared_count16__release_sharedB7v160000Ev (discord_utils.node + 0x16db>
                                                   #10 0x00007d36152a87e1 n/a (n/a + 0x0)
                                                   #11 0x00007d36152a88ae n/a (n/a + 0x0)
                                                   #12 0x00007d36157c3448 n/a (n/a + 0x0)
                                                   #13 0x00007d36157c378c n/a (n/a + 0x0)
                                                   #14 0x00007d36157c0aad n/a (n/a + 0x0)
                                                   #15 0x00007d36157b0e41 n/a (n/a + 0x0)
                                                   #16 0x00007d358bf84256 _ZN7discord5inputL15OnDisplayFDReadEP9uv_poll_sii (discord_utils.node + 0x7a256)
                                                   #17 0x000058b19fd16177 n/a (n/a + 0x0)
                                                   #18 0x000058b19fd05c15 n/a (n/a + 0x0)
                                                   #19 0x00007d358bf891d9 _ZN7discord2uv17ThreadedEventLoop10ThreadMainEPKcNSt4__Cr7promiseIvEE (discord_utils>
                                                   #20 0x00007d358bf89404 _ZNSt4__Cr14__thread_proxyB7v160000INS_5tupleIJNS_10unique_ptrINS_15__thread_structE>
                                                   #21 0x00007d36152fce0e n/a (n/a + 0x0)
                                                   #22 0x00007d36153817d4 n/a (n/a + 0x0)
                                                   ELF object binary architecture: AMD x86-64
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301431
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301431
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301430
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301430
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301430
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301430
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:12 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.1.0 timeout, but soft recovered
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.1.0 timeout, but soft recovered
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301430
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:22 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.1.0 timeout, but soft recovered
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32772)
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:  in process kwin_wayland pid 1525 thread kwin_wayla:cs0 pid 1575
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b (UTCL2)
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00301430
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Mar 03 14:54:32 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Mar 03 14:54:43 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=1046657, emitted seq=1046660
Mar 03 14:54:43 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: Process information: process plasmashell pid 6328 thread plasmashel:cs0 pid 6364
Mar 03 14:54:43 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Mar 03 14:54:43 McTherodin kernel: amdgpu 0000:0b:00.0: amdgpu: Ring gfx_0.0.0 reset failure
Mar 03 14:54:44 McTherodin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Mar 03 14:54:46 McTherodin systemd-coredump[6523]: [🡕] Process 6328 (plasmashell) of user 1000 dumped core.

                                                   Stack trace of thread 6364:
                                                   #0  0x000074f1a2c335e7 n/a (n/a + 0x0)
                                                   #1  0x000074f198de20e3 n/a (n/a + 0x0)
                                                   #2  0x000074f198de55b3 n/a (n/a + 0x0)
                                                   #3  0x000074f1988dd8a4 n/a (n/a + 0x0)
                                                   #4  0x000074f19891271d n/a (n/a + 0x0)
                                                   #5  0x000074f1a2ca370a n/a (n/a + 0x0)
                                                   #6  0x000074f1a2d27aac n/a (n/a + 0x0)
                                                   ELF object binary architecture: AMD x86-64

Any assistance would be greatly appreciated.

Probably has to do with some game platform settings. ex… vram, frame gen, motion blur, AMD FSR

Running the game at its lowest setting without frame generation still results in the above mentioned crash.
Vram usage is around 30-40% at most.

The crashes do appear to be more frequent during multiplayer.

@scannerdarkly

According to ProtonDB - Sons Of The Forest it looks like it works fine on Proton 9.0-4 with nVidea but for AMD you may have to change the game to Proton Experimental for AMD, not sure if you have tried that yet?

With regards to ProtonDB - Monster Hunter Wilds it just looks incredibly unstable and isn’t really working well for anyone, but again you could try switching to Proton Experimental if you haven’t already.

My first stop is usually ProtonDB. I’ve gone through the Proton versions and also tried GloriousEggroll’s releases. I remove the old prefix before trying a new version.

I’ve been looking at similar reports:

Similar issues

Power and boost clocks

The above threads concerning power and boost clocks seem to suggest a possible cause.

I am experiencing this same issue with Doom: The Dark Ages, and Assassin’s Creed: Valhalla. But I can use my PC regularly all day no problem. Its only been the past 2 months or so I have noticed it. As a side note I played For the King 2 for almost 8 hours one day and it was perfectly stable. Very frustrating.

What has helped my setup with certain titles (Baldur’s Gate 3) is to set my GPU’s power profile to Power Saving with Corectrl. I go from having crashes consistently every 10-15min, to no crashes at all.

I’ve recently switched to GE-Proton 10-1 for Monster Hunter Wilds and enabled Wine-Wayland, and the results were pretty promising. Still need more testing, but the crashes might have been solved.

I’ve also added gpu_recovery=1 and lockup_timeout=1000 to my kernel parameters, although I don’t believe I’ve run into a situation where they’ve been triggered just yet. No crashes so far.

But yeah, it’s pretty difficult to troubleshoot since it can appear quite random and present as either a hardware failure or power supply issue.
I’ve undervolted my card and run stress tests without experiencing any issues. Also used frame generation technologies with certain titles and everything worked fine.

1 Like

I started having this problem last week with my 6700XT (the same
[gfxhub] page fault). It seems like the GPU is putting itself into low power/idle mode even when its going full bore in a game, taking out the game and display server.

with a little poking around the guilty feature seems to be PP_GFXOFF_MASK

This dynamically turns off the graphics engine for powersaving. Its clearly malfunctioning. You can disable it with a kernel parameter.

amdgpu.ppfeaturemask=0xfffd3fff

Some discussion here:

1 Like

I also need help

Thanks for the suggestion. I’ve changed my boot parameters from what is suggested in the ArchWiki, to amdgpu.ppfeaturemask=0xf7fff.

Been testing it for a few days now with no crashes. Think this issue is finally resolved for me. :partying_face:

I just had to say it. :sweat_smile:

First crash in a few days. Nothing noted in the journal, using journalctl -p err..alert, but I am getting the following errors with PROTON_LOG=1:

10685.619:0134:016c:warn:vkd3d-proton:d3d12_pipeline_state_init_graphics_create_info: DSV format is DXGI_FORMAT_UNKNOWN.
10685.678:0134:016c:warn:vkd3d-proton:d3d12_cached_pipeline_state_validate: PSO compatibility hash mismatch.
10688.593:0134:0168:warn:vkd3d-proton:vkd3d_validate_shader_io_signatures: No corresponding output signature element found for Normal0.
10689.854:0134:0164:warn:vkd3d-proton:vkd3d_validate_shader_io_signatures: No corresponding output signature element found for INTERPOLATOR0.
10690.109:0134:0158:warn:seh:virtual_unwind backtrace: 0000000001925264: unknown module.
10698.462:0134:0658:warn:threadname:NtSetInformationThread Thread renamed to L"wine_threadpool_worker"
10740.859:0134:0240:warn:vkd3d-proton:dxgi_vk_swap_chain_wait_worker: Incrementing frame latency semaphore beyond max latency. Did application forget to acquire? (new count = 6, max latency = 1)

Seems like a vkd3d or game issue… I think the previous solution might still be valid since I no longer get the error with which I opened this post?

Multiple crashes with the following error noted in the journal:

kwin_wayland[1684]: kwin_libinput: Libinput: client bug: timer event7 debounce: scheduled expiry is in the past (-58ms), your system is too slow
kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
kernel: snd_hda_intel 0000:0b:00.1: Unable to change power state from D3hot to D0, device inaccessible
kwin_wayland[1684]: kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver

Will make a bug report at https://gitlab.freedesktop.org/drm/amd/-/issues.

I’ve been experiencing this on and off for the last little bit as well, just thought I’d throw in what’s worked to workaround these crashes for me. At several days uptime with 40+ hours gaming.

In my case at least, it was the system putting the driver into a low power state while it’s trying to run full bore (at least it looks that way with logging the power/clocks/dpm status during gaming). So disabling the runtime power management of amdgpu fixes it for me. This may cause increased power usage in laptops as it prevents the gpu from fully powering down though, but on a desktop the power draw difference is basically nothing with a 6900xt in my case.

Adding this to kernel params and regenerate kernel images should be enough:

amdgpu.runpm=0

I’m hesitant to say it will definitely solve everyone’s problems, as there seems to be several issues manifesting in the same or similar errors over the last couple of months. But it was a random shot in the dark for me too, and this particular sacrifice has appeased the tech gods on my end.

Thanks, will give this a try and report back. :+1:

Seemed to be more stable, but eventually crashed with the following errors:

JournalCTL:

org_kde_powerdevil[1934]: [ 1934] busno=6, All features that should not exist detected. Monitor does not indicate unsupported
kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:18 param:0x00000005 message:TransferTableSmu2Dram?
kernel: amdgpu 0000:0b:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:8 param:0x00000000 message:EnableSmuFeaturesLow?
kernel: i2c-designware-pci 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible
kwin_wayland[1645]: kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver
kernel: amdgpu 0000:0b:00.0: amdgpu: [smu_v11_0_auto_fan_control]Start smc FAN CONTROL feature failed!
kernel: amdgpu 0000:0b:00.0: amdgpu: [smu_v11_0_set_fan_control_mode]Set fan control mode failed!
kernel: xhci_hcd 0000:0b:00.2: Unable to change power state from D3hot to D0, device inaccessible
kernel: snd_hda_intel 0000:0b:00.1: Unable to change power state from D3hot to D0, device inaccessible

Additional errors provided through PROTON_LOG=1:

warn:vkd3d-proton:dxgi_vk_swap_chain_wait_worker: Failed to increment swapchain semaphore. Did application forget to acquire?
warn:vkd3d-proton:vkd3d_native_sync_handle_release: Failed to release semaphore (#12a).

As mentioned elsewhere this may be related to PSR.

Try boot option

amdgpu.dcdebugmask=0x12

(I keep meaning to make a larger post about this, especially since there are a few variants .. but .. maybe later. :sweat_smile:)

Thanks, will add and test. :+1:

Still getting crashes, but seems like there’s no amdgpu errors in the journal anymore.

Please see the other possibilities here

Thanks, gonna work through them. :folded_hands: