[rant] ring gfx_0.0.0 timeout is upon us

It’s the time of the year - the holidays - when you tax your AMD GPU a little bit more than usual, because you maybe want to play a game or do some other fancy stuff that is somewhat more taxing than drawing an DE window in a ten year old OpenGL 3.3 implementation.

It’s heart-warming to see all the AMD bugs that fill the mesa bug tracker for years are still alive and kicking.

You can despise nvidia, but you have to respect them. They deliver if you play by their rules. But AMD? They pretend and fail hard. Over and over and over. And let’s not even go into HIP/rocm or the dark pit of video en-/decoding.

Hopefully Intel prevails in the discrete GPU market. There needs to be more competition.

[/rant over]

1 Like

You ruined my joy! I just bought an overclocked RX 6750. :sob:

1 Like

And I bought a new computer ThinkStation P620. It is using my tried and tested WX7100 PRO.

I am as joyful as ever … which doesn’t count much when looking back :frowning_face:

I have no issue with RX 5000 series.

  • Removed amvlk and lib32-amdvlk
  • Removed xf86-video-amdgpu and the config /etc/X11/xorg.conf.d/20-amdgpu.conf
  • Installed mesa and set the environment variable:
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json
  • Kernel 6.1 added the module amdgpu

What does this do?

I didn’t see this card before. Was only looking at the 7900xt and 7900xtx

Edit: I like the 6750xt because it’s price point was fantastic. I would have to get a Nvidia 3070 or 3070 Ti. Probably the TI to be better than this. Depends on who you ask or what review you look at. Too many biased opinions.

not sure what you are on about.
Did not have any notable bugs and always had great performance on my RX480, after that on the Vega64 and now on my 6750XT … something on your system seems to be broken.

lol - their discrete GPU’s are not even supported by their open source driver on linux, the only way you can run them is on modesetting and that is months after the release … way to go. They definitely won’t make it into my systems if they continue that way …

amdvlk is not installed by default and has weak performance - not sure why anyone would install it anyway. Removing xf86-video-amdgpu disables many features - not recommended and also not needed.
Simply install vulkan-radeon and lib32-vulkan-radeon and you won’t need to modify anything else.

mesa is installed by default …

Are you using the 6750 XT?

yes, heavily gaming on it sincy July - works great. Got the Sapphire Nitro+ one

Great… I was pretty sure this card was worth the deal. They had them on sale for $499 Cdn

1 Like

Running Steam or some applications use your selected AMD Vulkan driver when this environment variable was set. e.g:

  • /usr/share/vulkan/icd.d/radeon_icd.x86_64.json for vulkan-radeon

  • /usr/share/vulkan/icd.d/amd_icd64.json for amdvlk

  • /usr/share/vulkan/icd.d/amd_pro_icd64.json for vulkan-amdgpu-pro


@BS86 I already knew that. But we do not know if @Schlaefer installed all drivers that would cause this issue “ring gfx timeout …”

1 Like

It seems that RDNA2 receives a lot more love than RDNA1. So don’t be sad, your experience may be much better.

i managed to get that ring gfx_0.0.0 timeout on my 6750 XT, too - but the reason was easily found. It happened within an hour after increasing my GPU undervolting value in corectrl, after reverting to the former value, my system was stable again for many hours of gaming.
Maybe you are also doing undervolting and did it too far …

No undervolting here.

That’s just the generic error that kernel lost connection to the GPU, which can happen for all kinds of reasons. Triggering a hardware failure due to undervolting is probably one of them.

I have seen many ring gfx_0.0.0 timeouts in last three months. They usually happening when I am just browsing web. I did not have any of these issues before kernel went 6.0.* But again this might be a bad GPU, Mesa, Picom or anything so it is very hard to debug… I can’t blame only Linux or Mesa because I had my share of problems also in Windows with Navi series cards. My wife has older RX570 and it has been rock solid.

My GPU is RX6600 (no undervolting or OC)

Latest soft reset points to Picom but I had tons of these even on Wayland.

amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x00008005046ea000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00501031
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x1
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x000080050471e000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x00008001082bf000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x000080010c2d6000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x00008005046bf000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x000080050424e000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x00008005046aa000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x000080010c2d7000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x00008001082be000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:5 pasid:32780, for process picom pid 12133 thread picom:cs0 pid 12203)
amdgpu 0000:09:00.0: amdgpu:   in page starting at address 0x00008005046ab000 from client 0x1b (UTCL2)
amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
amdgpu 0000:09:00.0: amdgpu:          Faulty UTCL2 client ID: CB/DB (0x0)
amdgpu 0000:09:00.0: amdgpu:          MORE_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          WALKER_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
amdgpu 0000:09:00.0: amdgpu:          MAPPING_ERROR: 0x0
amdgpu 0000:09:00.0: amdgpu:          RW: 0x0
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, but soft recovered

I’ve been having this exact problem lately. I can replicate a complete system crash with my 5700XT using Blender. All I have to do is open it and in the scene with the default cube press numpad buttons quickly to change camera orientation. Within seconds of doing this the whole system completely freezes. Would be interesting to know if anybody can replicate this scenario with their AMD card too.

For me this shits my PC only when I play videos and the infamous GCVM_L2_PROTECTION_FAULT kicks in, other cases it just soft resets. Because of random nature it very hard to replicate. I don’t do any heavy video editing :zipper_mouth_face:

Play quite often and this has never happened during gaming, so it is hard to think that I have faulty GPU.

I am pretty sure that my GPU is not the problem. For testing purposes I installed Ubuntu 22.10 and I cannot replicate the issue at all. I even tried EOS Live USB with XFCE and it is doing the exact same thing as the installed system with Plasma.