5700 XT started hanging with intensive games last few days

Been working great for 3 months since building new rig, recent kernel/mesa update perhaps?

Captured this on dmesg:

[12455.414836] ------------[ cut here ]------------ [12455.414893] WARNING: CPU: 9 PID: 830 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_resource.c:3087 dcn20_validate_bandwidth_fp+0x6d/0xb0 [amdgpu] [12455.414894] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device gspca_vc032x uvcvideo gspca_main videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common joydev input_leds videodev mousedev hid_steam mc cdc_acm btusb btrtl btbcm btintel hid_generic bluetooth usbhid ecdh_generic hid ecc iwlmvm lm92 mac80211 amdgpu libarc4 iwlwifi edac_mce_amd kvm_amd snd_hda_codec_realtek kvm snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi nls_iso8859_1 snd_hda_intel nls_cp437 vfat irqbypass snd_intel_dspcfg wmi_bmof gpu_sched fat crct10dif_pclmul crc32_pclmul snd_hda_codec cfg80211 ttm ghash_clmulni_intel snd_hda_core drm_kms_helper snd_hwdep snd_pcm igb cec aesni_intel rc_core snd_timer syscopyarea crypto_simd sysfillrect ccp snd sp5100_tco cryptd i2c_algo_bit sysimgblt glue_helper pcspkr rng_core i2c_piix4 rfkill fb_sys_fops soundcore dca k10temp wmi evdev mac_hid pinctrl_amd acpi_cpufreq drm pkcs8_key_parser crypto_user agpgart ip_tables x_tables ext4 [12455.414917] crc32c_generic crc16 mbcache jbd2 crc32c_intel xhci_pci xhci_hcd [12455.414921] CPU: 9 PID: 830 Comm: Xorg Tainted: G W 5.6.15-arch1-1 #1 [12455.414921] Hardware name: Gigabyte Technology Co., Ltd. X570 I AORUS PRO WIFI/X570 I AORUS PRO WIFI, BIOS F11 12/06/2019 [12455.414969] RIP: 0010:dcn20_validate_bandwidth_fp+0x6d/0xb0 [amdgpu] [12455.414970] Code: 00 7b 35 22 85 e8 1d 00 00 75 2f 31 d2 f2 0f 11 85 a8 21 00 00 48 89 ee 4c 89 e7 e8 2d f6 ff ff 89 c2 22 95 e8 1d 00 00 75 2a <0f> 0b 48 89 9d a8 21 00 00 5b 5d 41 5c c3 75 c9 48 89 9d a8 21 00 [12455.414971] RSP: 0018:ffffb686008e3aa0 EFLAGS: 00010246 [12455.414972] RAX: 0000000000000001 RBX: 4079400000000000 RCX: 00000000011bca09 [12455.414972] RDX: 0000000000000000 RSI: 5e5490947de3ae50 RDI: 00000000000321a0 [12455.414973] RBP: ffff919a362c0000 R08: ffffb686008e3a48 R09: ffff919ae3880000 [12455.414973] R10: ffff919a5f87d000 R11: 0000000100000001 R12: ffff919ae3880000 [12455.414973] R13: ffff919a9c9fbf80 R14: ffff919af612a400 R15: ffff919a362c0000 [12455.414974] FS: 00007f92f4836e80(0000) GS:ffff919afec40000(0000) knlGS:0000000000000000 [12455.414975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12455.414975] CR2: 00007f928ed8e000 CR3: 00000007db472000 CR4: 0000000000340ee0 [12455.414976] Call Trace: [12455.415029] dcn20_validate_bandwidth+0x24/0x40 [amdgpu] [12455.415077] dc_validate_global_state+0x2f2/0x390 [amdgpu] [12455.415129] amdgpu_dm_atomic_check+0xeca/0x1000 [amdgpu] [12455.415141] drm_atomic_check_only+0x563/0x7f0 [drm] [12455.415145] ? _raw_spin_unlock_irqrestore+0x20/0x40 [12455.415153] drm_atomic_commit+0x13/0x50 [drm] [12455.415162] drm_atomic_connector_commit_dpms+0xda/0x100 [drm] [12455.415171] drm_mode_obj_set_property_ioctl+0x167/0x2e0 [drm] [12455.415181] ? drm_connector_set_obj_prop+0x90/0x90 [drm] [12455.415189] drm_connector_property_set_ioctl+0x39/0x60 [drm] [12455.415197] drm_ioctl_kernel+0xb2/0x100 [drm] [12455.415206] drm_ioctl+0x208/0x360 [drm] [12455.415214] ? drm_connector_set_obj_prop+0x90/0x90 [drm] [12455.415251] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [12455.415254] ksys_ioctl+0x82/0xc0 [12455.415256] __x64_sys_ioctl+0x16/0x20 [12455.415258] do_syscall_64+0x49/0x90 [12455.415259] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [12455.415261] RIP: 0033:0x7f92f546f8eb [12455.415261] Code: 0f 1e fa 48 8b 05 a5 95 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 95 0c 00 f7 d8 64 89 01 48 [12455.415262] RSP: 002b:00007ffd3e85e468 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [12455.415262] RAX: ffffffffffffffda RBX: 00007ffd3e85e4a0 RCX: 00007f92f546f8eb [12455.415263] RDX: 00007ffd3e85e4a0 RSI: 00000000c01064ab RDI: 000000000000000d [12455.415263] RBP: 00000000c01064ab R08: 0000000000000000 R09: 000055b99014b730 [12455.415264] R10: 0000000000000000 R11: 0000000000000246 R12: 000055b98ffba550 [12455.415264] R13: 000000000000000d R14: 000055b990fc03f0 R15: 0000000000000000 [12455.415266] ---[ end trace 400e01af85cba02e ]---

Similar errors seen here, but I’m not technical enough to know whether relevant.

[brad@host ~]$ uname -a
Linux host 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux

I think I’ve sorted it out. Downgrading mesa and the kernel didn’t do anything - still got hangs and crashes after playing games for a random amount of time.

Looked a bit closer at the fan curve - they were never speeding up in response to increased temps. Changed a few ventilation things using CoreCtrl, and it has been stable for the last few hours.

I guess with the warmer weather in the northern hemisphere now it tipped my system past its critical threshold for gaming without fans ramping up. So, solved I think?

1 Like