Amd APU random black screen with kernel 5.10

I have one PC with “AMD Ryzen 5 3400G with Radeon Vega Graphics” and with kernel 5.10 every once in a while after several minutes it is giving a black screen with no keyboard or mouse input anymore. Remote login is still possible. So it is not completely frozen and I can reboot via ssh.

The DE is cinnamon. The iommu is turned off in the BIOS because of screen tearing issues with this CPU.

The issue does not happen with kernel 5.4. Anybody else experiencing the same thing?

The PC is still running Manjaro (mea culpa). I hope I can ask that question here anyways.

When the black screen happens I see over 1.800 of these messages in the log:

kernel: [drm] pstate TEST_DEBUG_DATA: 0x3EFE0000

Nothing else what catches my attention. No WARNING messages etc.

inxi -F -z --no-host
System:    Kernel: 5.4.100-1-MANJARO x86_64 bits: 64 Console: tty 0 Distro: Manjaro Linux 
Machine:   Type: Desktop Mobo: Micro-Star model: B450-A PRO MAX (MS-7B86) v: 4.0 serial: <filter> 
           BIOS: American Megatrends LLC. v: M.C0 date: 02/03/2021 
CPU:       Info: Quad Core model: AMD Ryzen 5 3400G with Radeon Vega Graphics bits: 64 type: MT MCP L2 cache: 2 MiB 
           Speed: 1259 MHz min/max: 1400/3700 MHz Core speeds (MHz): 1: 1259 2: 1258 3: 1259 4: 1278 5: 1260 6: 1259 7: 1260 
           8: 1335 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel 
           Display: server: X.Org 1.20.10 driver: loaded: amdgpu,ati unloaded: modesetting resolution: 2560x1440~60Hz 
           OpenGL: renderer: llvmpipe (LLVM 11.1.0 256 bits) v: 4.5 Mesa 20.3.4 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio driver: snd_hda_intel 
           Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio driver: snd_hda_intel 
           Device-3: MCS FURUTECH ADL GT40α type: USB driver: hid-generic,snd-usb-audio,usbhid 
           Sound Server: ALSA v: k5.4.100-1-MANJARO 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 
           IF: enp34s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:    Local Storage: total: 2.73 TiB used: 1.96 TiB (71.9%) 
           ID-1: /dev/sda vendor: Western Digital model: WD20EZRZ-00Z5HB0 size: 1.82 TiB 
           ID-2: /dev/sdb vendor: Samsung model: SSD 850 EVO 1TB size: 931.51 GiB 
Partition: ID-1: / size: 58.57 GiB used: 23.55 GiB (40.2%) fs: xfs dev: /dev/sdb1 
           ID-2: /home size: 872.49 GiB used: 285.45 GiB (32.7%) fs: xfs dev: /dev/sdb3 
           ID-3: /opt size: 1.82 TiB used: 1.38 TiB (75.9%) fs: xfs dev: /dev/sda1 
Swap:      ID-1: swap-1 type: zram size: 1.9 GiB used: 0 KiB (0.0%) dev: /dev/zram0 
           ID-2: swap-2 type: zram size: 1.9 GiB used: 0 KiB (0.0%) dev: /dev/zram1 
           ID-3: swap-3 type: zram size: 1.9 GiB used: 0 KiB (0.0%) dev: /dev/zram2 
           ID-4: swap-4 type: zram size: 1.9 GiB used: 0 KiB (0.0%) dev: /dev/zram3 
           ID-5: swap-5 type: zram size: 1.9 GiB used: 0 KiB (0.0%) dev: /dev/zram4 
           ID-6: swap-6 type: zram size: 1.9 GiB used: 0 KiB (0.0%) dev: /dev/zram5 
           ID-7: swap-7 type: zram size: 1.9 GiB used: 0 KiB (0.0%) dev: /dev/zram6 
           ID-8: swap-8 type: zram size: 1.9 GiB used: 0 KiB (0.0%) dev: /dev/zram7 
Sensors:   System Temperatures: cpu: 35.8 C mobo: N/A gpu: amdgpu temp: 35.0 C 
           Fan Speeds (RPM): N/A 
Info:      Processes: 255 Uptime: 1h 27m Memory: 60.77 GiB used: 1.59 GiB (2.6%) Shell: Zsh inxi: 3.3.01

I’m not sure if my problem is similar to yours but is it a ‘wake from sleep’ issue? My Asrock Deskmini with Ryzen 2200G CPU and Arch 5.11 has no problem at all with waking from sleep. However, My NUC7 has recently started to have that problem. The monitor actually appears to wake when a key is pressed after sleep but no video is sent to the screen so it remains black. This started for me about a week ago and I don’t remember why or if a kernel update for linux-lts was responsible. This is apparently a not-uncommon fault with Intel NUCs but perhaps not with AMD Ryzens. For my NUC7, I install powerkit from AUR and it installs xscreensaver. The NUC will then wake from sleep with a keypress. You do get some annoying 1980s-style screen savers but it’s possible to run xscreensaver_demo and choose just one sensible one.

No, this black screen occurs while working with the PC.

same result with linux 511 ?
any option on boot kernel param ?

Manjaro kernels are not the same as Arch kernels - different patches, different config. Also, quite old versions if it’s running on stable.

I think you can install an Arch kernel on Manjaro and it will still boot… :thinking:

I have the same CPU/Gpu as you do, and there are some differences I spotted - but I don’t know if they matter…

In particular - what is the ati driver doing in there? Also - the tearing issues I had were solved another way than yours, by putting an entry in X11. Here’s inxi -Fz

┌08:16:29 WD= [~/Downloads]
└───freebird@aerie ─▶$ inxi -Fz
System:    Kernel: 5.10.16-arch1-1 x86_64 bits: 64 Desktop: Xfce 4.16.0 Distro: Arch Linux 
Machine:   Type: Desktop Mobo: ASUSTeK model: PRIME B450-PLUS v: Rev X.0x serial: <filter> UEFI: American Megatrends v: 2008 
           date: 12/06/2019 
CPU:       Info: Quad Core model: AMD Ryzen 5 2400G with Radeon Vega Graphics bits: 64 type: MT MCP L2 cache: 2 MiB 
           Speed: 3819 MHz min/max: 1600/3900 MHz Core speeds (MHz): 1: 3819 2: 3118 3: 3592 4: 3867 5: 3897 6: 3120 7: 3385 
           8: 3507 
Graphics:  Device-1: AMD Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] driver: amdgpu v: kernel 
           Display: x11 server: X.Org 1.20.10 driver: loaded: amdgpu unloaded: modesetting resolution: 3840x2160~60Hz 
           Message: Unable to show advanced data. Required tool glxinfo missing. 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio driver: snd_hda_intel 
           Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio driver: snd_hda_intel 
           Sound Server: ALSA v: k5.10.16-arch1-1 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 
           IF: enp3s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           IF-ID-1: wgpia0 state: unknown speed: N/A duplex: N/A mac: N/A 
Drives:    Local Storage: total: 6.6 TiB used: 1.57 TiB (23.8%) 
           ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO Plus 250GB size: 232.89 GiB 
           ID-2: /dev/sda vendor: Western Digital model: WDS100T2B0A-00SM50 size: 931.51 GiB 
           ID-3: /dev/sdb vendor: Western Digital model: WD40EFRX-68N32N0 size: 3.64 TiB 
           ID-4: /dev/sdc type: USB vendor: Seagate model: Expansion Desk size: 1.82 TiB 
Partition: ID-1: / size: 111.93 GiB used: 20.65 GiB (18.4%) fs: ext4 dev: /dev/sda5 
Swap:      ID-1: swap-1 type: partition size: 1.92 GiB used: 0 KiB (0.0%) dev: /dev/nvme0n1p7 
Sensors:   System Temperatures: cpu: 49.0 C mobo: N/A gpu: amdgpu temp: 49.0 C 
           Fan Speeds (RPM): N/A 
Info:      Processes: 246 Uptime: 3d 31m Memory: 13.65 GiB used: 3.74 GiB (27.4%) Shell: Bash inxi: 3.3.00 

and here is the tearing fix: (in /etc/X11/xorg.conf.d

Section "Device"
	Identifier  "AMD"
	Driver "amdgpu"
    Option "TearFree" "on"
EndSection

I asm not sure if it’s needed any more (with the newer kernels) but it is still there on most of my builds. No problems for a long time, not on 5.10, or on 5.11 so far.

1 Like

That is indeed a very good point: loaded: amdgpu,ati looks weird

Based on yours and @jonathon 's feedback I did the following:

*) I activated iommu in the BIOS again and set the “TearFree” option in xorg.xonf.d instead
*) I installed the arch linux-lts kernel 5.10.20

The inxi driver section makes more sense now: loaded: amdgpu
no mentioning of ati.

I will leave it as that and see if that fixed it. Sooner or later I will migrate this machine to EnOS. It is the last man standing in our household. :wink:

Thank you!

4 Likes

On Cinnamon do you have windows effects turned off in effects? I would not have iommu off.

I also see your graphics card isn’t working properly.

OpenGL: renderer: llvmpipe (LLVM 11.1.0 256 bits) v: 4.5 Mesa 20.3.4 

I see it’s running on Mesa not showing the card here. It should show the card as the renderer like mine.

OpenGL: renderer: Radeon RX 590 Series (POLARIS10 DRM 3.40.0 5.11.2-arch1-1 LLVM 11.1.0) v: 4.6 Mesa 20.3.4 
direct render: Yes 

As @freebird54 suggests you should set for screen tearing too. I would be looking for the latest UEFI Bios update.

I do not have a dedicated video card it is an APU: AMD Ryzen 5 3400G with Radeon Vega Graphics

Unfortuantley this info is missing in the output of @freebird54. So that I can not compare.
@freebird54 can you please install glxinfo and do the inxi -F again? I am wondering what your APU says in that section. For me, with arch kernel 5.10.20 it lokks like this now:

Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel 
           Display: server: X.Org 1.20.10 driver: loaded: amdgpu resolution: 2560x1440~60Hz 
           OpenGL: renderer: llvmpipe (LLVM 11.1.0 256 bits) v: 4.5 Mesa 20.3.4

I understand you don’t have a dedicated card and I’m not sure how it reports for an onboard. Maybe @freebird54 could install glxinfo and show his graphics with inxi -Fxxxz --no-host and or inxi -Ga

Had to figure out which build had it on! (not to mention switching machines):

┌11:28:25 WD= [~/Downloads]
└───freebird@aerie ─▶$ inxi -Ga
Graphics:  Device-1: AMD Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] vendor: ASUSTeK driver: amdgpu v: kernel 
           bus ID: 09:00.0 chip ID: 1002:15dd class ID: 0300 
           Display: x11 server: X.Org 1.20.10 driver: loaded: amdgpu unloaded: modesetting alternate: ati,fbdev,vesa 
           display ID: :0.0 screens: 1 
           Screen-1: 0 s-res: 3840x2160 s-dpi: 96 s-size: 1016x571mm (40.0x22.5") s-diag: 1165mm (45.9") 
           Monitor-1: HDMI-A-0 res: 3840x2160 hz: 60 dpi: 157 size: 621x341mm (24.4x13.4") diag: 708mm (27.9") 
           OpenGL: renderer: AMD Radeon Vega 11 Graphics (RAVEN DRM 3.40.0 5.10.16-arch1-1 LLVM 11.1.0) v: 4.6 Mesa 20.3.4 
           direct render: Yes 

Hope it helps…

Yes it does in my opinion. Yours is showing AMD Radeon Vega 11 graphics which is correct.

So it looks like something is wrong with the setup of my PC.

Tomorrow is Sunday. I will take the opportunity and migrate the PC to EndeavourOS. The arch kernel 5.10.20 did not give any black screen so far. That makes me confident that the migration will finally solve the issue.

Thank you all!

1 Like

I moved the PC to EndeavourOS. All service up and running. Home directories retained. Perfect!

The inxi graphics info seems to be ok now:

Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel 
           Display: x11 server: X.org 1.20.10 driver: loaded: amdgpu,ati unloaded: fbdev,modesetting,vesa 
           resolution: <missing: xdpyinfo> 
           OpenGL: renderer: AMD Radeon Vega 11 Graphics (RAVEN DRM 3.40.0 5.10.20-1-lts LLVM 11.1.0) v: 4.6 Mesa 20.3.4 
4 Likes

Yes it looks right now. Nice to see you got it working on EndeavourOS now. :slightly_smiling_face:

I have very rare black screens on my 3400G on 5.10.7 which only last for ~3 seconds. On Sway the situation is a bit worse: I sometimes experience graphics corruption, followed by a frozen system when using firefox. I’ll have to see if 5.12 is good.