System becomes unresponsive shortly after boot

Hello. My system keeps crashing within 10 minutes or less after boot. The whole interface becomes unresponsive, only showing mouse movement. It requires me to do a hard reset with the power button. There is no obvious trigger for this behaviour; it happens regardless of program. It might even stop working without any input at all.
I previously faced this issue and resorted to re-installing the system. It worked for a couple of weeks but it’s back again. I have not done any hardware changes in months. I have already updated the system to no avail. I’m currently on a live environment.
Hardware information can be found here. Partition and format into is found here. journalctl logs are found here. I’m using KDE Plasma.

I appreciate any help. This is really frustrating.

This obviously means that an upgraded package is causing the issue.

Correct me if I’m wrong, but these logs appear to be for the boot subsequent to the crash? Only the journal logs around the time the system starts to become unresponsive can be helpful.

Regardless, if the system works fine after a clean install but starts failing after a period of usage, the logical conclusion here is that some update broke the system. Therefore, your next course of action would be to check your pacman logs to identify the problematic update. Check the timestamps in the pacman log that correspond to the time just before the problem started to occur. Once you’ve identified the suspected packages, downgrade them and see if the problem goes away.

Here’s a friendly reminder: Do not do this.

If your system hangs, the first thing you can do is to try to switch to a TTY. If you can switch to a TTY, you then have access to the journal logs, which will help you diagnose the problem. If all else fails, the sysreq key and REISUB is your friend. If you want to know how to set this up, there is an excellent guide written by our friendly neighbourhood Frogman on this very topic.

4 Likes

I did not reply yesterday as the system kept working after several reboots and I needed to use it.

Correct me if I’m wrong, but these logs appear to be for the boot subsequent to the crash?

Yes, it’s correct. It’s the only logs I can get. Fetchings logs from a live environment through arch-chroot obviously yields nothing.

If your system hangs, the first thing you can do is to try to switch to a TTY.

I did this, but unless I’m using TTY wrong, inputting any commands gives me no response.

If all else fails, the sysreq key and REISUB is your friend.

This is working at the least, and I get different outputs through the input sequence.

Regardless, if the system works fine after a clean install but starts failing after a period of usage, the logical conclusion here is that some update broke the system. Therefore, your next course of action would be to check your pacman logs to identify the problematic update.

Out of desperation, I rolled back all packages to a week (11/08), at which time I was having no issues, but the issue prevailed. The system appears to be working fine rolled back to 04/08. I’m sure this is absolutely not the best approach, but I need to use my computer. I’m not exactly sure how to proceed from here without the system breaking again from updating.

Okay. The next step would be to check the pacman logs for the list of packages upated in the first update you performed after 04/08. That should help narrow down the list of culprits.

Yep. Hence the general recommendation to avoid forcibly shutting down the machine with the power button. The journaling service can still run if we allow the system to shut down gracefully.

Yep. Hence the general recommendation to avoid forcibly shutting down the machine with the power button. The journaling service can still run if we allow the system to shut down gracefully.

Thank you, this is very helpful to know. I have memorised REISUB for future troubleshooting.

Okay. The next step would be to check the pacman logs for the list of packages upated in the first update you performed after 04/08. That should help narrow down the list of culprits.

What would you suggest is the best way to go about discarding packages? Update one by one until the system breaks again?

Have you identified the problematic package?

I do not think the cause is obvious in this case. Also I think this theory seems unlikely. They had the issue and it went away for a few weeks after re-installation. If a package caused the issue, the problem would recur immediately after bringing the new system up to date if they are using the same packages.

It sounds like it could be a hardware issue. This hunt for a “problematic package” may be a red herring.

A valid point. But we have to start the diagnosis somewhere. What do you recommend? Memtests and disk checks, perhaps?

By the same token, if it’s a hardware issue, would it really take several weeks for the symptoms to resurface? In my experience, hardware issues persist across re-installs and they show up immediately.

Yes, I think this is a good idea!

Testing other kernels is a good starting point also–at least the LTS kernel. Even if it does not resolve the issue, it may be informative.

Definitely agree with this. Though the OP has yet to provide any feedback.

1 Like

I apologise for my late reply. Since rolling back to 08/04, the system boots without issues. I will get around to doing memtests and disk checks as soon as I can and post results.

Testing other kernels is a good starting point also–at least the LTS kernel. Even if it does not resolve the issue, it may be informative.

I tried this the first time around, before reinstalling the OS. I went with the LTS kernel but it made no difference.

I finally got around to sitting down and trying different things. Things aren’t doing so well.

Memtests and disk checks, perhaps?

I ran memtest86 with no errors. I did a single pass, but I can do more passes if needed. I checked SMART and fsck on both my drives. Here is the result of the primary drive and here the secondary drive where /home is stored. There are errors but I think these stem from NTFS partitions?

I had a working system so I tried to upgrade the system and start probing for problematic packages. Just my luck, I experienced power loss during the upgrade process which wreaked havoc on everything: bootloader, kernel, packages, configurations, etc. I wasted an entire evening just to have a half-working system.

I resorted to reinstalling the OS again, and issue is back right away. Perhaps downgrading packages to 04/08 would work, I thought, but now it makes no difference. I installed the LTS kernel, with no avail.

I’m at a complete loss here.

EDIT: I booted into my Windows partition and I have no problems. This might be an indicator it’s not a hardware issue?

Does the issue occur on other desktop environments? What about the XFCE version?

Does the issue also occur when you’re using a tty? If you aren’t able to reproduce the issue on the tty, then it might be graphics related.

Here’s everything I’ve done so far.

Does the issue occur on other desktop environments? What about the XFCE version?

From a live environment, I chroot’d into the system, installed xfce4 and turned off autologin.

I rebooted the computer and logged into XFCE. No issues. I rebooted a couple of times to make sure.

Does the issue also occur when you’re using a tty? If you aren’t able to reproduce the issue on the tty, then it might be graphics related.

As mentioned above, when the computer hanged, I was able to switch tty but I got no login prompt. I decided to tried again anyway. Before that, I changed the resolution of systemd-boot following the steps here.

Once done, I switched session from XFCE over to Plasma and… it’s now working fine? I’ve rebooted a couple of times to test it out, but it no longer hangs. I’ve been typing this whole post on Plasma without issue. Still, I want to know what changed to identify the issue.

Test with XFCE for a few days. If you can run XFCE without any issues, then it’s probably plasma-related.

Going back and forth to Plasma, I managed to reproduce the issue, and this time around I got this on tty3:

EXT4-fs error (device nvme0n1p2): _ext4_find_entry:1678: inode #3029648: comm plasmashell: reading directory lblock 0
Buffer I/O error on dev nvme0n1p2, logical block 0, lost sync page write
EXT4-fs (nvme0n1p2): I/O error while writing superblock

nvme0n1p2 is the partition where I install root.

Could it be a partition issue? I’m not inclined to believe it’s an issue with the SSD, because I face no problems with Windows installed on nvme0n1p4. It also mentions plasmashell, which may indicate a Plasma issue?

It means that some blocks on that partition could not be written. This corresponds to the result of the fsck check you ran on your primary drive. fsck failed to recognize that block as the ext4 file system even though that drive was formatted as ext4 – in other words, that block is corrupted.

I recommend backing up your data from the tty.

Were you able to reproduce the issue on XFCE?

Were you able to reproduce the issue on XFCE?

In the time Plasma would’ve crashed, XFCE keeps working just fine. As noted above, I installed vanilla XFCE alongside Plasma. I’m thinking on maybe trying the default EndeavourOS installation with XFCE? But I’d rather get Plasma working as that’s what I’m used to and use on my laptop as well.

Back up all your data first. Can’t rule out hardware yet.

In the meantime, test the XFCE version and see how it goes.