Random crashes on fresh install: EXT4-fs error

Hi!
I freshly installed Endeavour on my laptop yesterday and have been loving the experience, but I am also experiencing random crashes multiple times a day. I am running KDE and my system info is here: https://clbin.com/egJw0

The crashes occur anywhere from every 10 minutes to 3 hours. Yesterday I got an error message about the backlight brightness, which I had also experienced on a previous manjaro installation, which is why I added acpi_backlight=vendor to the GRUB-bootlader. I am not getting that error anymore but am still experiencing the random crashes. See the two attached images of the crashes from yesterday and today.


I haven’t really done any big changes since the install, just customizations

I don’t think the problem is from Ext4 as it is very stable. Try using linux-lts kernel and see if it continues 🤷

1 Like

I will try that out and see if it helps with the crashes. Thanks!

I wasn’t sure how to label the post that’s why I mentioned the EXT4 error in the title :smiley:

1 Like

So after testing it out for a bit I have still experienced two crashes, but now the system freezes first for about 10-15 seconds.

1 Like

weird i would say disk is toast… but probably just a false assumption…

its in ahci (bios) and is fstab ok ?

2 Likes

Return this from a terminal please:
inxi -Fxxxza --no-host

1 Like
results of inxi
System:    Kernel: 5.10.74-1-lts x86_64 bits: 64 compiler: gcc v: 11.1.0 
           parameters: BOOT_IMAGE=/boot/vmlinuz-linux-lts root=UUID=3db8f44a-0254-4367-b354-28b9c51689eb rw quiet loglevel=3 
           nowatchdog acpi_backlight=vendor 
           Desktop: KDE Plasma 5.23.0 tk: Qt 5.15.2 wm: kwin_x11 vt: 1 dm: SDDM Distro: EndeavourOS base: Arch Linux 
Machine:   Type: Laptop System: HUAWEI product: KPL-W0X v: M1D serial: <filter> 
           Mobo: HUAWEI model: KPL-W0X-PCB v: M1D serial: <filter> UEFI: HUAWEI v: 1.22 date: 02/26/2019 
Battery:   ID-1: BAT1 charge: 53.9 Wh (99.1%) condition: 54.4/56.3 Wh (96.6%) volts: 8.3 min: 7.6 
           model: DYNAPACK HB4593R1ECW type: Li-ion serial: <filter> status: Discharging cycles: 120 
CPU:       Info: Quad Core model: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx bits: 64 type: MT MCP arch: Zen 
           family: 17 (23) model-id: 11 (17) stepping: 0 microcode: 8101007 cache: L2: 2 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 31938 
           Speed: 1458 MHz min/max: 1600/2000 MHz boost: enabled Core speeds (MHz): 1: 1458 2: 1369 3: 1368 4: 1368 5: 1368 
           6: 1368 7: 1485 8: 1593 
           Vulnerabilities: Type: itlb_multihit status: Not affected 
           Type: l1tf status: Not affected 
           Type: mds status: Not affected 
           Type: meltdown status: Not affected 
           Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, STIBP: disabled, RSB filling 
           Type: srbds status: Not affected 
           Type: tsx_async_abort status: Not affected 
Graphics:  Device-1: AMD Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] vendor: Huawei driver: amdgpu 
           v: kernel bus-ID: 03:00.0 chip-ID: 1002:15dd class-ID: 0300 
           Display: x11 server: X.org 1.20.13 compositor: kwin_x11 driver: loaded: amdgpu,ati 
           unloaded: fbdev,modesetting,vesa resolution: <missing: xdpyinfo> 
           Message: Unable to show advanced data. Required tool glxinfo missing. 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio vendor: Huawei 
           driver: snd_hda_intel v: kernel bus-ID: 03:00.1 chip-ID: 1002:15de class-ID: 0403 
           Device-2: Advanced Micro Devices [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor vendor: Huawei driver: N/A 
           alternate: snd_pci_acp3x, snd_rn_pci_acp3x bus-ID: 03:00.5 chip-ID: 1022:15e2 class-ID: 0480 
           Device-3: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: Huawei driver: snd_hda_intel v: kernel 
           bus-ID: 03:00.6 chip-ID: 1022:15e3 class-ID: 0403 
           Sound Server-1: ALSA v: k5.10.74-1-lts running: yes 
           Sound Server-2: JACK v: 1.9.19 running: no 
           Sound Server-3: PulseAudio v: 15.0 running: yes 
           Sound Server-4: PipeWire v: 0.3.38 running: no 
Network:   Device-1: Intel Wireless 8265 / 8275 driver: iwlwifi v: kernel bus-ID: 01:00.0 chip-ID: 8086:24fd class-ID: 0280 
           IF: wlan0 state: up mac: <filter> 
Drives:    Local Storage: total: 465.76 GiB used: 11.08 GiB (2.4%) 
           SMART Message: Unable to run smartctl. Root privileges required. 
           ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Western Digital model: WDS500G2B0C-00PXH0 size: 465.76 GiB block-size: 
           physical: 512 B logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter> rev: 211070WD temp: 25.9 C 
           scheme: GPT 
Partition: ID-1: / raw-size: 60 GiB size: 58.76 GiB (97.93%) used: 9.87 GiB (16.8%) fs: ext4 dev: /dev/nvme0n1p8 
           maj-min: 259:8 
           ID-2: /boot/efi raw-size: 512 MiB size: 511 MiB (99.80%) used: 300 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p5 
           maj-min: 259:5 
           ID-3: /home raw-size: 121.17 GiB size: 118.7 GiB (97.97%) used: 1.21 GiB (1.0%) fs: ext4 dev: /dev/nvme0n1p9 
           maj-min: 259:9 
Swap:      Kernel: swappiness: 60 (default) cache-pressure: 100 (default) 
           ID-1: swap-1 type: partition size: 2 GiB used: 0 KiB (0.0%) priority: -2 dev: /dev/nvme0n1p10 maj-min: 259:10 
Sensors:   System Temperatures: cpu: 33.0 C mobo: N/A gpu: amdgpu temp: 33.0 C 
           Fan Speeds (RPM): N/A 
Info:      Processes: 233 Uptime: 1m wakeups: 1 Memory: 6.75 GiB used: 1.82 GiB (27.0%) Init: systemd v: 249 tool: systemctl 
           Compilers: gcc: 11.1.0 Packages: pacman: 1021 lib: 236 Shell: Zsh v: 5.8 running-in: konsole inxi: 3.3.06 

Kind of unsure how to check it, still somewhat of a newbie :laughing:
But here is the contents of my fstab

fstab
# <file system>             <mount point>  <type>  <options>  <dump>  <pass>
UUID=49AA-9B72                            /boot/efi      vfat    umask=0077 0 2
UUID=3db8f44a-0254-4367-b354-28b9c51689eb /              ext4    defaults,noatime 0 1
UUID=23f42881-5d9e-481d-b500-5ff0b75777d3 /home          ext4    defaults,noatime 0 2

I am dualbooting windows on this laptop.

You are running linux-lts. But linux should include many enhancements for the AMD processors. Does that help at all?

Another thing: is there an update from the vendor for the motherboard BIOS? It is nearly 3 years old.
Note that if you update the BIOS, be sure to use the exactly right BIOS for your machine. Otherwise you’ll just brick it.

1 Like

I started using the linux-lts after the suggestion of @lighttigerXIV. I was experiencing the crashes before and after the change of kernel.

I will check for BIOS updates, thank you :slight_smile:

Edit: Appearently there is an updated BIOS version available, it was just very well hidden on the huawei website. Will let you know how the situation changes after I complete the update.
Edit2: Nvm, i read the numbers wrong. The BIOS version is sadly the newest available.

1 Like

could you check journal for:
journalctl | grep resetting
?

And to make sure filesystem is O.K.:
sudo fsck.ext4 /dev/nvme0n1p8 as it needs the drive to be unmounted you need to run that from LiveISO.

1 Like

grep doesn’t find any entry for resetting

Edit: Will check the filesystem later and let you know :slight_smile:

That`s a good sign! so hardrive is not faulty (mostlikely)
You are sure the drive is connected well? could be also some dust…

1 Like

It is always a possibility that the drive could be not correctly connected, but I haven’t had any problems on the parallel windows install, which I would expect with a faulty drive connection. Can’t really check without opening up the laptop (which kind of sucks on this model)

on a notebook it will not be the case that often with a bad connection.

1 Like

will be the one you should try
plus check trimming:
systemctl status fstrim.timer

1 Like

To be fair it is a reasonable guess, since I upgraded the drive last year.

:wink: but if it runs fine from the same drive on windows… i would go for it at a last resort.

1 Like
fstrim
â—‹ fstrim.timer - Discard unused blocks once a week
     Loaded: loaded (/usr/lib/systemd/system/fstrim.timer; disabled; vendor preset: disabled)
     Active: inactive (dead)
    Trigger: n/a
   Triggers: â—Ź fstrim.service
       Docs: man:fstrim

so you are using continuous trimming… may you try to use the timer instead, will be less load on the drive.

1 Like