Problematic Behaviour with 7900xtx

New to forums. Need to resolve this behaviour.

Started from a clean install with bare minimum packages to get test game running (Windrose, behaviour occurs on a lot of others though). I acquired a 7900xtx as there seemed to be positive responses to using it under Linux, only now am I finding that it presents this error:

May 06 21:50:42 fluffys-pc kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
May 06 21:50:42 fluffys-pc kernel: amdgpu 0000:03:00.0: Ring gfx_0.0.0 reset succeeded
May 06 21:50:42 fluffys-pc kernel: amdgpu 0000:03:00.0: Starting gfx_0.0.0 ring reset
May 06 21:50:42 fluffys-pc kernel: amdgpu 0000:03:00.0:  Process GameThread pid 6845 thread vkd3d_queue pid 6892
May 06 21:50:42 fluffys-pc kernel: amdgpu 0000:03:00.0: ring gfx_0.0.0 timeout, signaled seq=1885012, emitted seq=1885014
May 06 21:50:42 fluffys-pc kernel: amdgpu 0000:03:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
May 06 21:50:42 fluffys-pc kernel: amdgpu 0000:03:00.0: [drm] AMDGPU device coredump file has been created

Does anyone know of a resolution to this fault and/or more information I could present to help resolve it? It’s getting pretty tedious. I have seen some suggestions of mesa-git but there seems to be caveats to that I am unable to grasp.

System information

System:
  Host: fluffys-pc Kernel: 7.0.3-arch1-2 arch: x86_64 bits: 64
  Desktop: KDE Plasma v: 6.6.4 Distro: EndeavourOS
Machine:
  Type: Desktop Mobo: Micro-Star model: PRO Z790-P DDR4 (MS-7E06) v: 1.0
    serial: <superuser required> Firmware: UEFI vendor: American Megatrends LLC.
    v: 1.90 date: 10/27/2023
CPU:
  Info: 8-core model: 13th Gen Intel Core i7-13700K bits: 64 type: MT MCP
    cache: L2: 16 MiB
  Speed (MHz): avg: 800 min/max: 800/5300:5400 cores: 1: 800 2: 800 3: 800
    4: 800 5: 800 6: 800 7: 800 8: 800 9: 800 10: 800 11: 800 12: 800 13: 800
    14: 800 15: 800 16: 800
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900
    XTX/7900 GRE/7900M] driver: amdgpu v: kernel
  Display: wayland server: X.org v: 1.21.1.22 with: Xwayland v: 24.1.11
    compositor: kwin_wayland driver: gpu: amdgpu resolution: 2560x1440~144Hz
  API: EGL v: 1.5 drivers: kms_swrast,radeonsi,swrast
    platforms: gbm,wayland,x11,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 26.0.6-arch1.1
    renderer: AMD Radeon RX 7900 XTX (radeonsi navi31 ACO DRM 3.64
    7.0.3-arch1-2)
  API: Vulkan v: 1.4.341 drivers: radv surfaces: N/A
  Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
    de: kscreen-console,kscreen-doctor wl: wayland-info
    x11: xdpyinfo, xprop, xrandr

Welcome to the community @PocketDragon :waving_hand::smiley: :enos_flag:

Thanks for sharing those details. The first one I don’t think tells us all that much. The device wedged, but recovered... is essentially letting us know it had a fault, a hang, and it’s reset itself to recover.

What’s not mentioned is what caused it (aside from a process ID perhaps) or why unfortunately.

You mentioned you “acquired” this 7900XTX. Is it 2nd hand?

I’d also suggest simply re-seating it, being mindful to remove any possible dust in the slot, or on the card before re-seating it firmly, but carefully. Also re-check your power connectors for it.

Yeah, that’s kinda why I’m on the forums hoping someone has seen such a useless error that might know anything at all. The internet so far has proven incredibly useless, describing behaviours alongside the error message. Essentially when running any recent dx12 software, it kills itself after a given period.

Just incredibly tired; was bought from the store.

I am unfamiliar with hardware to know how to conduct these checks. I’ll look it up.

Was the GPU installed by the store?

Yes, I paid the local computer store to install it to avoid any strife from poor installation (I have not worked with them prior to this, but by all reports they’re competent). However, I have just conducted the checks outlined and I am going to try to establish the fault again.

This behaviour has been persistent for about a week, and I have given up and legged it to the forums, for clarity. Hence the nuked installation to have a clean slate to figure out the issue from.

All good. Just good to know there’s that fallback available, and also helps to know if a mate or family member might have done it, in which case, it’s worth re-checking.

I’d hope a store knew what they were doing though.

Given it was freshly installed, sometimes these simple checks can solve the issue without wasting hours of time fiddling with drivers and packages that actually had nothin to do with it.

So to clarify, this issue has persisted from before to after a fresh install? (that’d be another reason to check hardware first :sweat_smile:).

There have been a number of important BIOS updates to your motherboard too. I’d recommend grabbing the latest one.
https://www.msi.com/Motherboard/PRO-Z790-P-DDR4/support

My initial install had these behaviours, and this new install also has them.

Was hoping to get away with fwupd, as it did work at one point, but if it hasn’t guess I’ll do it proper.

Yeah, fault’s still present.

Spat out a bit more this time that doesn’t seem to say much else, but here it is anyway.

May 06 23:10:47 fluffys-pc kwin_wayland_wrapper\[1566\]: **amd**gpu: The CS has cancelled because the context is lost. This context is innocent.
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: \[drm\] device wedged, but recovered through reset
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: \[drm\] \*ERROR\* Failed to initialize parser -125!
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: GPU reset(1) succeeded!
May 06 23:10:46 fluffys-pc firefox\[4979\]: **amd**gpu: The CS has cancelled because the context is lost. This context is innocent.
May 06 23:10:46 fluffys-pc steam\[5617\]: radv/**amd**gpu: The CS has been cancelled because the context is lost. This context is innocent.
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring jpeg_dec uses VM inv eng 4 on hub 8
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring vcn_unified_1 uses VM inv eng 1 on hub 8
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring vcn_unified_0 uses VM inv eng 0 on hub 8
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring sdma1 uses VM inv eng 13 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring sdma0 uses VM inv eng 12 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring comp_1.3.1 uses VM inv eng 11 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring comp_1.2.1 uses VM inv eng 10 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring comp_1.1.1 uses VM inv eng 9 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring comp_1.0.1 uses VM inv eng 8 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring comp_1.3.0 uses VM inv eng 7 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring comp_1.2.0 uses VM inv eng 6 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: \[drm\] DMUB hardware initialized: version=0x07002F00
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: SMU is resumed successfully!
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: SMU driver if version not matched
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: smu driver if version = 0x0000003d, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x004e8300 (78.131.0)
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: SMU is resuming...
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: SECUREDISPLAY: optional securedisplay ta ucode is not available
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: RAP: optional rap ta ucode is not available
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: reserve 0x1300000 from 0x85fc000000 for PSP TMR
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: PSP is resuming...
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: VRAM is lost due to GPU reset!
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: \[drm\] PCIE GART of 512M enabled (table at 0x0000008000300000).
May 06 23:10:46 fluffys-pc kernel: **amd**gpu 0000:03:00.0: GPU reset succeeded, trying to resume
May 06 23:10:45 fluffys-pc kernel: **amd**gpu 0000:03:00.0: GPU smu mode1 reset
May 06 23:10:45 fluffys-pc kernel: **amd**gpu 0000:03:00.0: GPU mode1 reset
May 06 23:10:45 fluffys-pc kernel: **amd**gpu 0000:03:00.0: MODE1 reset
May 06 23:10:45 fluffys-pc kernel: \[drm:gfx_v11_0_hw_fini \[**amd**gpu\]\] \*ERROR\* failed to halt cp gfx
May 06 23:10:45 fluffys-pc kernel: **amd**gpu 0000:03:00.0: failed to unmap legacy queue
May 06 23:10:45 fluffys-pc kernel: **amd**gpu 0000:03:00.0: MES failed to respond to msg=REMOVE_QUEUE
May 06 23:10:43 fluffys-pc kernel: **amd**gpu 0000:03:00.0: GPU reset begin!. Source:  1
May 06 23:10:43 fluffys-pc kernel: **amd**gpu 0000:03:00.0: Ring gfx_0.0.0 reset failed
May 06 23:10:43 fluffys-pc kernel: **amd**gpu 0000:03:00.0: The CPFW hasn't support pipe reset yet.
May 06 23:10:43 fluffys-pc kernel: **amd**gpu 0000:03:00.0: reset via MES failed and try pipe reset -110
May 06 23:10:43 fluffys-pc kernel: **amd**gpu 0000:03:00.0: failed to reset legacy queue
May 06 23:10:43 fluffys-pc kernel: **amd**gpu 0000:03:00.0: MES failed to respond to msg=RESET
May 06 23:10:41 fluffys-pc kernel: \[drm:gfx_v11_0_bad_op_irq \[**amd**gpu\]\] \*ERROR\* Illegal opcode in command stream
May 06 23:10:41 fluffys-pc kernel: **amd**gpu 0000:03:00.0: Starting gfx_0.0.0 ring reset
May 06 23:10:41 fluffys-pc kernel: **amd**gpu 0000:03:00.0:  Process GameThread pid 6806 thread vkd3d_queue pid 6849
May 06 23:10:41 fluffys-pc kernel: **amd**gpu 0000:03:00.0: ring gfx_0.0.0 timeout, signaled seq=9863636, emitted seq=9863638
May 06 23:10:41 fluffys-pc kernel: **amd**gpu 0000:03:00.0: \[drm\] Check your /sys/class/drm/card1/device/devcoredump/data
May 06 23:10:41 fluffys-pc kernel: **amd**gpu 0000:03:00.0: \[drm\] AMDGPU device coredump file has been created
May 06 23:10:41 fluffys-pc kernel: **amd**gpu 0000:03:00.0: Dumping IP State Completed
May 06 23:10:41 fluffys-pc kernel: **amd**gpu 0000:03:00.0: Dumping IP State

Going to check bios update rubbish now and see if that brings it up to snuff. Also did not resolve it.

Would it be possible for you to please try to boot into a LTS kernel, i.e. 6.18.x or into the 6.12.x and see if you still face the issue? Please note that Linux LTS 6.12.x is available from AUR and will require you to compile it, which is going to take a lot of time. On my under-powered system it took a few hours. This will rule out whether it is a problem with the Linux Kernel 7.0.3 or other branches too.

Secondly can you please try to set an environment variable

RADV_DEBUG=nocompute

reboot the system and then check if you are still facing the same issue while running firefox and steam?
Try this with the Linux LTS 6.18.x kernel as well as Linux LTS 6.12.x kernel too.

I am assuming that you have not overclocked your AMD GPU and it is running on its factory setting. If you have, can you please reset it to default values. For checking the various frequency values please look at the files present in the directory /sys/class/drm/card1/device or /sys/class/drm/card0/device/. You are looking for files pp_power_profile_mode, pp_od_clk_voltage and power_dpm_force_performance_level

You are using Intel i7-13700K processor, which has an inbuilt GPU, i.e. Intel UHD Graphics 770. So basically you have two GPUs. Now the inxi command ouput given above does not list the Intel GPU, i.e. UHD Graphics 770. Just to confirm that is the case would it be possible for you to please run the following command

lspci -k

and check that only a single VGA/Graphics is listed and that it is AMD. Also check in the output what is the Kernel driver in use listed? For intel it will be i9154 or equivalent. While for AMD it will be amdgpu

@PocketDragon

There are 11 newer Bios updates for this motherboard. I suggest you download the latest UEFI Bios update and use M Flash in the Bios settings screen to update it.

https://www.msi.com/Motherboard/PRO-Z790-P-DDR4/support

The behaviour became so prevalent that I had to uninstall the hardware until I had more time to debug it, I will be reinstalling it shortly to continue debugging it. Give me a couple hours to get back to you on this.

This update was conducted but the behaviour persisted enough that I had to swap back to my old card until I had time to sit down again and deal with it.


Response to steps:

lspci -k returns only returns a VGA/Graphics line pointing to AMD. Kernel driver in use lists both i915 and amdgpu.

Using LTS 6.12 resulted in the same faults. Setting RADV_DEBUG=nocompute caused the failure to present more rapidly.

Using LTS 6.18 resulted in the same behaviours as 6.12.


Going to try roll back drivers to see if that does anything. Well, that’s not working as it seems I’m misremembering how.

If the output of lspic -k shows only one GPU, i.e. AMD then the drivers in use cannot be i915. i915 is an intel driver. You are running a 13th Gen Intel Core i7-13700K CPU which has its own inbuilt GPU, i.e. iGPU. Intel® UHD Graphics 770.

Do you have a udev rule or some service or some timer which forces the Intel GPU to be switched off or disabled?

They should be able to set the Bios to dedication AMD gpu only.

@PocketDragon, what @ricklinux has said it spot on.

Can you please check in your BIOS what is the GPU that has been set. Has the intel UHD Graphics 770 been disabled?

Also can you please paste the output of lspci -k in this thread?

I would not think so, as this is a new install of Endeavour, with minimal configuration changes by me.
lspci -k output

00:00.0 Host bridge: Intel Corporation Raptor Lake-S Host Bridge/DRAM Controller (rev 01)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel modules: ie31200_edac
00:01.0 PCI bridge: Intel Corporation Raptor Lake PCI Express 5.0 Graphics Port (PEG010) (rev 01)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: pcieport
        Kernel modules: shpchp
00:02.0 Display controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
        DeviceName: Onboard - Video
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: i915
        Kernel modules: i915, xe
00:06.0 PCI bridge: Intel Corporation Raptor Lake PCI Express 4.0 Graphics Port (rev 01)
        Kernel driver in use: pcieport
        Kernel modules: shpchp
00:08.0 System peripheral: Intel Corporation GNA Scoring Accelerator module (rev 01)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
00:0a.0 Signal processing controller: Intel Corporation Raptor Lake Crashlog and Telemetry (rev 01)
        DeviceName: Onboard - Other
        Kernel driver in use: intel_vsec
        Kernel modules: intel_vsec
00:14.0 USB controller: Intel Corporation Raptor Lake USB 3.2 Gen 2x2 (20 Gb/s) XHCI Host Controller (rev 11)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: xhci_hcd
        Kernel modules: mei_me
00:14.2 RAM memory: Intel Corporation Raptor Lake PCH Shared SRAM (rev 11)
        DeviceName: Onboard - Other
00:16.0 Communication controller: Intel Corporation Raptor Lake CSME HECI #1 (rev 11)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: mei_me
        Kernel modules: mei_me
00:17.0 SATA controller: Intel Corporation Raptor Lake SATA AHCI Controller (rev 11)
        DeviceName: Onboard - SATA
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: ahci
        Kernel modules: ahci
00:1b.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #17 (rev 11)
        Kernel driver in use: pcieport
        Kernel modules: shpchp
00:1b.4 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #21 (rev 11)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: pcieport
        Kernel modules: shpchp
00:1c.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #1 (rev 11)
        Kernel driver in use: pcieport
        Kernel modules: shpchp
00:1c.1 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #2 (rev 11)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: pcieport
        Kernel modules: shpchp
00:1c.2 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #3 (rev 11)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: pcieport
        Kernel modules: shpchp
00:1f.0 ISA bridge: Intel Corporation Z790 Chipset LPC/eSPI Controller (rev 11)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
00:1f.3 Audio device: Intel Corporation Raptor Lake High Definition Audio Controller (rev 11)
        DeviceName: Onboard - Sound
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 9e06
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_soc_avs, snd_sof_pci_intel_tgl, snd_hda_intel
00:1f.4 SMBus: Intel Corporation 700 Series Chipset SMBus Controller (rev 11)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801
00:1f.5 Serial bus controller: Intel Corporation Raptor Lake SPI (flash) Controller (rev 11)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: intel-spi
        Kernel modules: spi_intel_pci
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 10)
        Kernel driver in use: pcieport
        Kernel modules: shpchp
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 10)
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
        Kernel driver in use: pcieport
        Kernel modules: shpchp
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M] (rev c8)
        Subsystem: XFX Limited RX-79XMERCB9 [SPEEDSTER MERC 310 RX 7900 XTX]
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel
04:00.0 Non-Volatile memory controller: Micron/Crucial Technology P2 [Nick P2] / P3 / P3 Plus NVMe PCIe SSD (DRAM-less) (rev 01)
        Subsystem: Micron/Crucial Technology Device 5021
        Kernel driver in use: nvme
        Kernel modules: nvme
06:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 / Patriot Viper VP4300 Lite NVMe SSD (DRAM-less) (rev 01)
        Subsystem: Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 / Patriot Viper VP4300 Lite NVMe SSD (DRAM-less)
        Kernel driver in use: nvme
        Kernel modules: nvme
08:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
        Subsystem: Intel Corporation Dual Band Wireless-AC 7265 [Stone Peak 2 AC]
        Kernel driver in use: iwlwifi
        Kernel modules: iwlwifi
09:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev 03)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7e06
        Kernel driver in use: igc
        Kernel modules: igc

Going to check bios now. There does not appear to be the option to disable the integrated graphics, but the default was set to the dgpu.

Good so you have both AMD GPU and Intel iGPU installed and configured with the proper drivers running. The intel iGPU is using i915 xe drivers. The AMD GPU is using amdgpu
Both of them are active.

Have you updated all the firmware as requested by @ricklinux ?

This shows something interesting. It shows the AMD GPU has been wedged. That is the GPU got reset due some issues. More information on this over here. AMD GPU and Intel GPU have all implemented this capability to inform user about a wedged GPU.

@PocketDragon what you will need is to look for logs above the timestamp May 06 23:10:47 and find out why is the AMD GPU is getting wedged. Typical reasons might be

  1. Faulty Drivers and firmware - So update to latest AMD drivers and BIOS firmware.
  2. Monitor the VRAM utilization of AMD GPU and see if it spikes.
  3. Monitor the temp of the AMD GPU. If it spikes or goes on the higher side then that also might lead to this problem. In this case you will have to look at cooling the GPU. Or see if there is any dust or blockage in the vent and air gaps. This looks like a desktop, so be prepared that you might have to go for better or bigger cooling solution.
    Intel CPU/GPUs typically can handle for some time 90-100 degrees Centigrade temp. AMD CPU/GPU typically max out at 80-85 degrees Centigrade.
  4. Reduce the games/steam and monitors refresh rate to say 60 Hz or 90 Hz. See at which minimum refresh rate does the issue not occur.

Also refer to this Reditt thread and turn on “enable unified gpu usage monitoring” in MSI Firmware that you have. This will have to be done AFTER you have updated your BIOS Firmware. Not Before. If this does not work then please set the “unified gpu usage monitoring” to back to its original value.

  • Turn on the computer and tap the DEL key to enter BIOS.
  • Press F7 to switch to Advanced Mode.
  • Navigate to Settings > Advanced > Integrated Graphics Configuration.
  • Set Initiate Graphic Adapter to PEG (PCI Express Graphics).
  • Set IGD Multi-Monitor to Disabled to ensure the system ignores the integrated graphics.
  • Press F10 to Save and Exit. [1, 2]

I have updated it.

Full up to date. Though it seems I am not the only one experiencing deficiencies on the new AMD drivers. However, rolling back also has not worked at all.

I have yet to observe a spike when this occurs.

I haven’t observed the intel side as I was focused on the AMD issues, but the hardware has only really been sitting near the 70-75 mark under load. I can investigate a better cooling solution regardless though. Though the issue has presented itself even before it has been under load for any extent, crashing nigh immediately that load is placed on it.

I have checked this on a fair number of things and the behaviour presents itself regardless.


I will check these in order and report back.

This does not present anywhere in bios. I also cannot find documentation on it that does not strictly refer to a windows specific utility. Do you have documentation on how to go about this? I may be overlooking something but I could not find anything useful on it.

Issue still occurred with this one.

@PocketDragon
What is the power supply you are using for the 7900xtx?