Random black screen/crashing with graphical glitches

Lately this has begun happening for me more and more. Typically the screen will freeze, then it will go black for a second, then resume sometimes with graphical glitches; think a square of pixels with randomly assigned colors.

As far as I can see this is only resolved by restarting, and it’s begun to happen almost daily. I have been running pretty demanding matlab simulations lately so I begun to think maybe it’s just hogging my memory somehow? That being said the error sometimes comes hours after having closed matlab so maybe not.

Regardless, is there any good way to check an error log to diagnose what happened?

Topic edited to correct spelling

Edit: Using gnome 40.2.0 with X11

Hello @morten_nissov
It would be helpful to know what desktop you are running and also if you could post the link to the following hardware info.

inxi -Fxxxz --no-host | eos-sendlog

Sorry didn’t consider adding that, I’ve added my DE info to the original in an edit.

Here’s the link http://ix.io/3r1L

Okay I looked at the hardware and i see it is using the proper module amdgpu. Is the black screen issue happening on boot up sometimes?

Never on boot up, at least not as far as I’ve experienced. The black screen only lasts a second or so as well, it’s mostly the freezing and being forced to restart.

Have you looked at any journalctl logs or dmesg. There is also some troubleshooting for log tools on the welcome screen.

Honestly no, Im not so familiar with these types of things. Do you have a recommendation for how I should start?

I took a look using journalctl -b -1 and If I had to guess it would be that the problem occurred somewhere around here

Jun 24 19:34:39 endeavour kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jun 24 19:34:39 endeavour kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out
Jun 24 19:34:39 endeavour systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Jun 24 19:34:39 endeavour audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostnam>
Jun 24 19:34:39 endeavour kernel: audit: type=1131 audit(1624556079.523:127): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr>
Jun 24 19:34:49 endeavour kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103600000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103601000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103602000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103603000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103604000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103605000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103606000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103607000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103608000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32777, for process vlc pid 10746 thread vlc:cs0 pid 10769)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800103609000 from client 27
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00601031
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 24 19:34:51 endeavour kernel: amdgpu 0000:05:00.0: amdgpu:          RW: 0x0
Jun 24 19:34:59 endeavour kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jun 24 19:35:00 endeavour systemd-sleep[15272]: Suspending system...
Jun 24 19:35:00 endeavour kernel: PM: suspend entry (deep)
Jun 24 21:23:24 endeavour kernel: Filesystems sync: 0.003 seconds
Jun 24 21:23:24 endeavour kernel: Freezing user space processes ... (elapsed 0.004 seconds) done.
Jun 24 21:23:24 endeavour kernel: OOM killer disabled.
Jun 24 21:23:24 endeavour kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Jun 24 21:23:24 endeavour kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Jun 24 21:23:24 endeavour kernel: psmouse serio2: Failed to disable mouse on synaptics-pt/serio0
Jun 24 21:23:24 endeavour kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jun 24 21:23:24 endeavour kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out
Jun 24 21:23:24 endeavour kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:78:eDP-1] flip_done timed out
Jun 24 21:23:24 endeavour kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:55:plane-3] flip_done timed out
Jun 24 21:23:24 endeavour kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:65:plane-5] flip_done timed out

Timewise it makes sense as well, though next time it happens I can be quicker to restart and check the journal.

Yes i see you have a lot of error messages. Could you post the following links as it’s easier to read when it’s in full screen.

sudo dmesg | eos-sendlog

also run the journalctl again

journalctl -b -1 | eos-sendlog

dmesg: http://ix.io/3r1Q

journalctl: http://ix.io/3r1R

Have you tried any kernel parameters in the default grub command line? Maybe you can try the following.

iommu=pt 

You can add this to the default grub command line in /etc/default/grub after “quiet”. If you need instructions let me know. Then run the update grub command.

sudo grub-mkconfig -o /boot/grub/grub.cfg

Then you can test with this for a while and see if you get the errors and if is better or worse? There are many parameters you could try.

So you mean adding it to GRUB_CMDLINE_LINUX_DEFAULT="quiet loglevel=3 nowatchdog"

Regarding trying kernel parameters, I don’t really know anything about this so I haven’t tried anything like that.

You can use the nano.

sudo nano /etc/default/grub

add

iommu=pt

Example: GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt loglevel=3 nowatchdog"

Then just save the file by

Ctrl + o and enter to write it out or save

Ctrl + x to exit

Then run the grub update command

sudo grub-mkconfig -o /boot/grub/grub.cfg

Both commands are o not zero.

Is there a place where these entries are listed with an explanation? I tried the arch wiki, but the page for kernel parameters doesn’t seem complete. For example iommu wasn’t mentioned as far as I could see.

Kernel parameters are used all the time.

https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html?highlight=amd_iommu

Edit: There is also this on Arch

https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF

@morten_nissov
You may want to look at updating the UEFI Bios. I don’t have the serial number of your laptop to check exactly but if i have the right one there have been 6 more updates since your version listed in the hardware. But I’m not 100% as i only have a model number. You should check that and update if possible as AMD has new AGESA updates many times. This may help with your issue also.

Following your advice it looks like it’s solved the problem. Time will tell as the crashing seems to be slightly random, but so far looks good.

Thanks

Have you checked the journal log again to se if the error messages are somewhat less.

Sorry for the delay in responding, had intended to reboot and capture the data over a couple days to see what the difference in errors was but I didn’t quite get around to it over the weekend.

That being said the original error happened again. I made sure to restart quickly such that the end of this journal is more or less when the error started and the resulting “freeze”
http://ix.io/3rsZ

Did iommu=pt help? I do see a lot of errors but they seem to be different. Wish i could help. :man_shrugging: