Serious Error - Twice

Today for the second time in the last couple months I ran into a extremely frustrating and catastrophic error with EndeavourOS. Basically what happened is that while browsing the internet the system freezes up and then crashes. When I restart it seems that the memory has gotten slightly corrupted because it gives me errors and has me manually run fsck. Last time I managed to get the system started again enough to pull data but it was extremely and increasingly unstable. I think it’s happened again tonight because I got a crash out of the blue with some similar symptoms. If I can’t get this to stop happening there’s just no way I can continue using this OS.

For some more specifics - my computer is a Thinkpad X1 Carbon 7th generation. I don’t remember ever having issues like this when running Windows before I first installed EndeavourOS or after the first time it happened. I’ve also had some firmware issues with the battery not being recognized and the system shutting down because it thinks the battery isn’t present, but I fixed those with a firmware update - although they first presented after this error first appeared and weren’t fixed until I had updated some firmware. The first time the error occurred things started freezing up in the GUI, programs crashed then refused to start, then the console returned “Segmentation Fault” (I think, it might have been another fault) for any input and the light in my caps lock key started flashing. This time Firefox crashed, so I tried updating the system with pacman -Syu and halfway through that the GUI started freezing up and the caps lock light started flashing. Then the computer crashed to a black screen and didn’t change after several minutes so I powered it off. On startup it said something about corrupted files and had me run fsck, which let it start. When I tried updating again pacman gave me errors, so I fixed it with a page from Arch Wiki. Then I restarted and tried using my computer normally, but then Firefox crashed again and then the whole computer crashed so I think the damage has been done again.

I know this description is long and not terribly specific, but I’m just really frustrated and tired of these issues. I really don’t want to go back to Windows but with errors like this there’s no way I can justify using this OS as a daily driver so I really want to figure out what’s wrong. Pleas ask any questions to help diagnose the problem, and I’ll keep careful track of any issues I run into from here on out.

Thanks for any help,
WaryAndroid

Hi and welcome to the forum.

First, please read this:

Without including the relevant system logs, it would be very difficult to help you. If you want to maximise your chances of getting help, a good idea is to read this, as well.

It is probably not productive to try to guess what is wrong with your system, but trying out the LTS kernel can’t hurt, and might fix your system freezes. To do so, install the packages linux-lts and linux-lts-headers, reboot and during boot, select the LTS kernel.

Also check this:

https://wiki.archlinux.org/title/Lenovo_ThinkPad_X1_Carbon_(Gen_7)


Second, if your computer for some reason freezes (and this can happen from time to time, it’s nothing catastrophic), do not cut power to it, as this can cause filesystem corruption. This is why fsck ran after you did this, and will probably run on the next few reboots, too. Hopefully, there was no filesystem corruption as this can sometimes be tricky to fix without reinstalling the OS.

On Linux, there is rarely ever a reason to do a hardware reset, for example, by cutting power to your computer, or by using the reset button on your case. Doing so can cause more problems than the original crash.

Instead, enable the magic SysRq key in advance, and use it to safely Reboot Even If System Utterly Broken:


Third, and slightly off topic:

Frustration is not your friend. If you are completely new to Linux, there is going to be a certain period (maybe 6 months, maybe a year) where nothing will seem to go your way, simply because Linux is very different from windoze. I would recommend you adopting a curious, problem-solving and learning mindset. Think of it as an endeavour, a journey into an unknown, exciting land called: the outside of your comfort zone. There will be many setbacks on this journey, but if you persist it will be very rewarding.

If you find yourself lacking the inquisitive, DIY mindset necessary for using Arch Linux (which is basically what EndeavourOS is) and would prefer things to (mostly) just work, then a distribution like Linux Mint might be a better choice, at least for the first few months of your Linux journey.

5 Likes

Welcome to the forum @WaryAndroid enos: :enos_flag: :tada: :balloon: :partying_face:

1 Like

@WaryAndroid
Here is some info from the Arch wiki you should look at regarding your particular laptop. There are lots of users with Lenovo products and also Carbon models.

https://wiki.archlinux.org/title/Lenovo_ThinkPad_X1_Carbon_(Gen_7)

1 Like

Here’s the logs, sorry for not including those originally:
Hardware: https://clbin.com/ts7pK
Boot log: https://clbin.com/GtYMm

As I was writing this reply, the battery issue I mentioned came back with a vengeance and forced a shutdown, corrupting the file system much worse than it was before. It booted to fsck twice and but now it’s going to grub recovery. I’m going to try and salvage the very few files not backed up later and then reinstall EndeavourOS.

I’ve been using EndeavourOS for a couple of years now, actually, except for after the first time this happened when I switched back to Windows briefly. I’m very familiar with the learning curve and weird issues, but I’ve found this death spiral series of issues a huge problem, particularly since it always seems to happen at the worst times.

Inthe future I’ll try and submit bug reports for the freezes and errors before theu get totally out of hand. I’ll also have to enable that shutdown sequence.

Did you read the info from the Arch Wiki? There are a number of issues that need to be dealt with because of bugs in the Firmware etc. You should read it over and make the necessary changes and adjustments as required.

Well none of those “issues” seem to be really relevant to the problem where the system is just crashing during use.

Yeah, I read that when I first installed Endeavour and it helped with the battery thing. I think what happened is that my power off screwed with the battery firmware and brought back that issue, which made the computer shut itself down and completely corrupt the file system beyond any recovery.

Is there any way I can keep the system from shutting itself down when it stops detecting the battery or when it reads as 0%? Windows just gave me an error about my battery, instead of catastrophically shutting down and corrupting my whole drive.

Honestly i don’t really know. Unless there are any Bios updates for your hardware? I wonder if you try this kernel parameter in the default grub command line and update grub.

nvme_core.default_ps_max_latency_us=0

Edit: Don’t know if you tried but this is supposed to turn APST off.

1 Like

you have theses errors

Jan 01 22:14:19 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524973008 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:19 the-nexus kernel: EXT4-fs warning (device nvme0n1p2): htree_dirblock_to_tree:1028: inode #16387213: lblock 0: comm systemd: error -5 reading directory block
Jan 01 22:14:19 the-nexus systemd[1]: var-lib-snapd-snap-discord-131.mount: Failed to check directory /var/lib/snapd/snap/discord/131: Input/output error
...
Jan 01 22:14:22 the-nexus polkitd[452]: <no filename>:4: subject=[Subject pid=540 user='lightdm' groups=lightdm seat='seat0' session='c1' local=true active=true]
Jan 01 22:14:22 the-nexus at-spi-bus-launcher[568]: dbus-daemon[568]: Activating service name='org.a11y.atspi.Registry' requested by ':1.0' (uid=970 pid=540 comm="/usr/bin/slick-greeter ")
Jan 01 22:14:22 the-nexus at-spi-bus-launcher[611]: dbus-daemon[611]: writing oom_score_adj error: Permission denied
Jan 01 22:14:22 the-nexus at-spi-bus-launcher[568]: dbus-daemon[568]: Successfully activated service 'org.a11y.atspi.Registry'
...

Jan 01 22:14:35 the-nexus gnome-keyring-ssh.desktop[704]: SSH_AUTH_SOCK=/run/user/1000/keyring/ssh
Jan 01 22:14:35 the-nexus gnome-keyring-secrets.desktop[705]: SSH_AUTH_SOCK=/run/user/1000/keyring/ssh
Jan 01 22:14:35 the-nexus gnome-keyring-pkcs11.desktop[706]: SSH_AUTH_SOCK=/run/user/1000/keyring/ssh
Jan 01 22:14:35 the-nexus gsd-usb-protect[733]: Failed to get screen saver status: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.ScreenSaver was not provided by any .service files
Jan 01 22:14:35 the-nexus kernel: rfkill: input handler disabled
Jan 01 22:14:35 the-nexus gsd-usb-protect[733]: Failed to fetch USBGuard parameters: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.usbguard1 was not provided by any .service files
Jan 01 22:14:35 the-nexus at-spi-bus-launcher[686]: dbus-daemon[686]: Activating service name='org.a11y.atspi.Registry' requested by ':1.1' (uid=1000 pid=713 comm="/usr/lib/gsd-power ")
Jan 01 22:14:35 the-nexus at-spi-bus-launcher[795]: dbus-daemon[795]: writing oom_score_adj error: Permission denied
...
Jan 01 22:14:37 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:37 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0
 and loops on the corruption fs ext4 

Sorry…I wasn’t sure what you were asking. I see the errors now.

Edit: Yes is not good!

@WaryAndroid
Do you have a setting in the Bios to turn off APST and then try adding this kernel parameter.

nvme_core.default_ps_max_latency_us=5500

I ended up not being able to salvage anything from that system (thank goodness for backups) and did a fresh install. Should I still add that on my new build? If so, how do I add that as a kernel parameter/which config file do I put it in? I read the arch wiki page on kernel parameters but I didn’t quite get it.

You have to add that mentioned parameter to /etc/default/grub. You can use any text editor to do this. As an example sudo nano /etc/default/grub.

Then find this line GRUB_CMDLINE_LINUX_DEFAULT=. It should contain some parameters already. Add the mentioned parameter to the end of the line within the double-quotes.

Ex:
NOTE: Don’t copy-paste the below code. It’s just for reference.

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=5500"

Then rebuild the grub.cfg by running this command,
sudo grub-mkconfig -o /boot/grub/grub.cfg.

Once the command completes successfully without errors you can restart your system.

1 Like

With the looped errors i saw i can understand you not being able to salvage it. You can try that kernel parameter with the instructions that @s4ndm4n has given and see if it makes any difference or not. Some of these m.2 drives have issues due to firmware problems or poor Bios implementation. The Adata drive is one that doesn’t have a great reputation from what i can gather. I don’t have a definitive answer to the problem. It’s a little bit of trial and error to try and narrow down the cause.

I want to emphasize what @Stephane already pointed out: You have plenty of nvme errors in the log. Just search for the keyword “sector” in your log to find them.

This suggests to me that you are facing a hardware issue. If you have still warranty for this laptop you should get a replacement NVME.

Example:

Jan 01 22:14:19 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524973008 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:19 the-nexus kernel: EXT4-fs warning (device nvme0n1p2): htree_dirblock_to_tree:1028: inode #16387213: lblock 0: comm systemd: error -5 reading directory block

Jan 01 22:14:38 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:38 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0
Jan 01 22:14:38 the-nexus systemd[643]: Starting Virtual filesystem service - Media Transfer Protocol monitor...

Jan 01 22:14:38 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:39 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0

Jan 01 22:14:39 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:39 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0

Jan 01 22:14:40 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:40 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0

Jan 01 22:14:40 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:41 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0

Jan 01 22:14:41 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:41 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0
Jan 01 22:14:42 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:42 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0

Jan 01 22:14:42 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:42 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0
Jan 01 22:14:43 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:43 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0
Jan 01 22:14:43 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975640 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:43 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16385601: comm zeitgeist-datah: reading directory lblock 0


Jan 01 22:14:45 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975240 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:45 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16386973: comm StreamTrans #3: reading directory lblock 0

Jan 01 22:14:45 the-nexus kernel: blk_update_request: critical medium error, dev nvme0n1, sector 524975240 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 01 22:14:45 the-nexus kernel: EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1612: inode #16386973: comm StreamTrans #2: reading directory lblock 0
2 Likes

I have seen this issue with my workplace laptops. It mainly errors code displays and the give-a-way is the blink sequence. But in your case, as many above pointed out your NVMe is dying. You should try to get the NVMe replaced. If you have warranty you should hurry.

1 Like

That’s really good to know, thank you so much for translating those logs. I’m working on a warranty claim now.