Since Saturday or so, my graphical environment has been dying on red and green colours. After a few seconds, it restarts itself and all my apps are closed. Here is a picture of one of my monitors, but all 3 kind of look like that when it happens.
I have recently migrated from a 2070 to a 5700 xt, but I have had no issues with the new card until now. I am using mesa-git from the AUR ever since. The card works fine on You-Know-Who(Harry Potter reference meaning Windows).
When you say until now? It was running on Arch up till now? What changed? Updates? Settings? Installed some new applications? Is this running on some platform? Gaming?
I got the red and green colours once again tonight. Not sure if related, but it’s been crashing while playing ffxiv and restarting. I also updated to kernel 5.10 so it might be related to that. This is what appears on screen when rebooting :
The motherboard bios is up to date. I will check how to view my gpu bios and if it is updated. I found a bios I can download, but I still need to figure out how to vie the current version.
I found some interesting errors with dmesg :
[ 137.651371] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx_0.0.0 timeout, signaled seq=118772, emitted seq=118774
[ 137.651422] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process ffxiv_dx11.exe pid 4417 thread ffxiv_dx11:cs0 pid 4443
[ 137.651424] amdgpu 0000:0d:00.0: amdgpu: GPU reset begin!
[ 141.651428] amdgpu 0000:0d:00.0: amdgpu: failed to suspend display audio
[ 141.651743] WARNING: CPU: 4 PID: 311 at drivers/gpu/drm/amd/amdgpu/…/display/dc/dcn20/dcn20_resource.c:3240 dcn20_validate_bandwidth_fp+0x8d/0xd0 [amdgpu]
[ 141.651744] Modules linked in: rfcomm joydev hid_corsair hid_glorious cmac algif_hash algif_skcipher af_alg bnep vmnet(OE) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 amdgpu xt_hl vfat ip6t_rt fat mousedev snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio iwlmvm snd_hda_codec_hdmi nf_log_ipv4 nf_log_common snd_hda_intel ipt_REJECT snd_intel_dspcfg nf_reject_ipv4 soundwire_intel soundwire_generic_allocation xt_LOG soundwire_cadence eeepc_wmi asus_wmi mac80211 snd_usb_audio gpu_sched sparse_keymap snd_hda_codec video snd_usbmidi_lib ttm wmi_bmof btusb xt_limit snd_rawmidi mxm_wmi edac_mce_amd btrtl xt_addrtype btbcm xt_tcpudp drm_kms_helper snd_hda_core snd_seq_device kvm_amd btintel mc libarc4 snd_hwdep soundwire_bus bluetooth kvm snd_soc_core iwlwifi cec xt_MASQUERADE ecdh_generic ecc irqbypass crct10dif_pclmul snd_compress crc32_pclmul ac97_bus drm ghash_clmulni_intel snd_pcm_dmaengine aesni_intel
[ 141.651772] blackmagic_io(POE) snd_pcm ccp iptable_nat crypto_simd cfg80211 r8169 cryptd snd_timer agpgart glue_helper snd rapl syscopyarea rng_core realtek sysfillrect sp5100_tco mdio_devres sysimgblt pcspkr k10temp fb_sys_fops soundcore igb i2c_piix4 libphy i2c_algo_bit dca rfkill xt_conntrack wmi pinctrl_amd mac_hid ip6table_filter acpi_cpufreq ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter vmmon(OE) vmw_vmci vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) uinput i2c_dev fuse crypto_user bpf_preload ip_tables x_tables ext4 usbhid crc32c_generic crc16 mbcache jbd2 uas usb_storage crc32c_intel xhci_pci xhci_pci_renesas
[ 141.651858] RIP: 0010:dcn20_validate_bandwidth_fp+0x8d/0xd0 [amdgpu]
[ 141.651924] dcn20_validate_bandwidth+0x24/0x40 [amdgpu]
[ 141.651980] dc_validate_global_state+0x3c3/0x4c0 [amdgpu]
[ 141.652037] dm_suspend+0x18b/0x1c0 [amdgpu]
[ 141.652078] amdgpu_device_ip_suspend_phase1+0x69/0xc0 [amdgpu]
[ 141.652120] ? amdgpu_fence_process+0x44/0x150 [amdgpu]
[ 141.652160] amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[ 141.652216] amdgpu_device_pre_asic_reset+0x185/0x19c [amdgpu]
[ 141.652271] amdgpu_device_gpu_recover.cold+0x5d1/0x98a [amdgpu]
[ 141.652324] amdgpu_job_timedout+0x121/0x140 [amdgpu]
[ 142.114277] amdgpu 0000:0d:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_2.1.0 test failed (-110)
[ 142.114325] [drm:gfx_v10_0_hw_fini [amdgpu]] ERROR KGQ disable failed
[ 142.357571] amdgpu 0000:0d:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_2.1.0 test failed (-110)
[ 142.357620] [drm:gfx_v10_0_hw_fini [amdgpu]] ERROR KCQ disable failed
[ 142.601708] [drm:gfx_v10_0_hw_fini [amdgpu]] ERROR failed to halt cp gfx
[ 142.658397] amdgpu 0000:0d:00.0: amdgpu: BACO reset
[ 145.799311] amdgpu 0000:0d:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 145.937285] amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 145.943284] amdgpu 0000:0d:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 145.943286] amdgpu 0000:0d:00.0: amdgpu: SMU is resuming…
[ 145.943290] amdgpu 0000:0d:00.0: amdgpu: smu driver if version = 0x00000036, smu fw if version = 0x00000037, smu fw version = 0x002a3d00 (42.61.0)
[ 145.943290] amdgpu 0000:0d:00.0: amdgpu: SMU driver if version not matched
[ 145.945638] amdgpu 0000:0d:00.0: amdgpu: SMU is resumed successfully!
[ 146.178058] amdgpu 0000:0d:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 146.178059] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 146.178060] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 146.178060] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 146.178061] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 146.178062] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 146.178062] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 146.178063] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 146.178064] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 146.178064] amdgpu 0000:0d:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 146.178065] amdgpu 0000:0d:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 146.178065] amdgpu 0000:0d:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 146.178066] amdgpu 0000:0d:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
[ 146.178066] amdgpu 0000:0d:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
[ 146.178067] amdgpu 0000:0d:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
[ 146.178068] amdgpu 0000:0d:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[ 146.180770] amdgpu 0000:0d:00.0: amdgpu: recover vram bo from shadow start
[ 146.187754] amdgpu 0000:0d:00.0: amdgpu: recover vram bo from shadow done
[ 146.187879] [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Failed to initialize parser -125!
[ 146.187925] amdgpu 0000:0d:00.0: amdgpu: GPU reset(2) succeeded!