System becomes unresponsive shortly after boot

I have all my personal data in the secondary drive, and keep backups in an external drive. I only keep OS data in the primary drive.

XFCE finally managed to crash, but it took much longer than Plasma. I’m going to remove this installation with Plasma/XFCE and see if the default Endeavour version behaves any differently.

Adding to a comment above, I’ve had no problems the times I’ve used the Windows partition in this time. It can run for hours. It’s the only partition I’ve kept untouched during the troubleshooting of this.

I tried the XFCE version, and while the issue hasn’t gone away, we might be getting somewhere.

I installed the offline XFCE version. I ran fsck -p beforehand to fix any filesystem issues.

The system booted fine, and ran fine for several sessions. I could run a session for 1-2 hours. I rebooted regularly to test the system and see if I could reproduce the issue. I did not upgrade the system. Eventually the issue returned.

The behaviour is slightly different from the Plasma version. The interface freezes, but now I can switch and login to a working tty and enter commands. Interestingly enough, if I switch back to XFCE, the lockscreen shows up and I can log back in. The system works for just a bit before freezing up again. This is reproducible.

With the above said, I finally captured the output of journalctl -k -b -0 at the time: https://pastebin.com/pRYcTGrJ eos-sendlog did not work because I was not connected to the Internet.

There is an updated Uefi Bios version F64 for this motherboard which has the new Updated AMD AGESA V2 1.2.0.8

You have a good nvme m.2 drive WD SN770. You also have a Samsung SSD. I’m not sure how you have EOS installed.

I have WD m.2 drives also and they are phenomenal. I have no issues but I am using btrfs and i have dual boot Windows 11 installed on an m.2 drive and EOS installed on my ssd.

I would also suggest that when doing a memory test you do a long memory test with multi pass. If you have errors on the drive with the file system sometimes software cannot correct them and it is better to wipe the drive and reinstall with a clean start.

I would look at updating the Uefi Bios since it is AMD which the AGESA is important.

The log shows a gap between 16:56:49 and 17:26:48. It doesn’t seem like there’s anything in there that indicates the cause of the issue.

This means that the display server crashed. Can you post the contents of the Xorg log file?

cat /var/log/Xorg.0.log

Can you post the contents of the Xorg log file?

Yes, here you have them: https://pastebin.com/czAhAGqb

There is an updated Uefi Bios version F64 for this motherboard which has the new Updated AMD AGESA V2 1.2.0.8

Thank you for the heads-up, I will upgrade the BIOS.

You have a good nvme m.2 drive WD SN770. You also have a Samsung SSD. I’m not sure how you have EOS installed.

I have partitions for EOS and W10 on the m.2 drive. The SSD is used only for personal data, to keep it separate from OS data.

I would also suggest that when doing a memory test you do a long memory test with multi pass. If you have errors on the drive with the file system sometimes software cannot correct them and it is better to wipe the drive and reinstall with a clean start.

I will perform a multi-pass memory test. I’ve also been thinking of wiping the entire drive including the Windows partition. I have already created a system image anyway, but for the moment I’ll keep troubleshooting the latest installation with anthony93.

It’s been quite some time since my last update. I wished to provide one last update to bring closure to the thread. I found out I was dealing with driver and hardware issues.

The system becoming unresponsive after boot was because of a driver issue with the Wi-Fi dongle. The issue was happening in the time the computer was attempting to establish an Internet connection after boot. This explains why it popped up after an update or after a fresh installation: seems the driver doesn’t behave with a newer version of some package. It happens specifically when connecting to a 5Ghz network. The device is a TP Link Archer T2U Plus with the RTL8821au chipset. The driver in question is the rtl8821au-dkms-git package in AUR. Unloading the module with rmmod, removing the package and re-installing sorts it.

This did not sort out the error I outlined in post #17. While using the Windows partition as backup, I had random BSODs with WHEA_UNCORRECTABLE_ERROR, pointing to a hardware issue. In Linux, the issue manifested as outlined in post #17. In the thread, I had already confirmed the health of my drives, and following ricklinux’s advice, I ran a multi-pass RAM test (all clear) and updated BIOS firmware to no avail. I probed the power supply with a multimeter, which read excellent values, as expected of SeaSonic. That left only the CPU and the motherboard.

I tackled the problem following Windows’ error, which lead me to this thread on Reddit.

The crashes relate to overclocking/voltage settings on Ryzen 5000 processors. The thread offers a variety of possible solutions, from modifying BIOS settings to simply replacing the CPU and/or motherboard. What worked for me was turning off Precision Boost Overdrive (PBO) in BIOS. For good measure, I checked all connections, reseated the CPU and replaced the thermal paste. I have not had any issues since, and I have even gone back to Plasma (looking back, I might have grown fond of XFCE).

I found little discussion on this among Linux communities. I’m hoping this post helps any future researchers who may be facing similar issues.

Thank you all for your guidance.

1 Like

I have the Ryzen 3800X which is 105W processor. It is running on an MSI X570 board. I have no issues and my dmesg is absolutely clean. I run Precision Boost Overdrive in auto mode but i also am running in ECO mode which is 65W. I don’t have a wifi dongle but i do have an internal pci-e card that is a tp-link but it has the Broadcom BCM4360 chip. I have used a number of Broadcom based tp-link and they have all been great. I don’t normally change many settings in the UEFI Bios screen trying to tweak it. I keep my UEFI Bios up to date and run mostly stock settings except for these couple changes. Glad you got it sorted out.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.