Just updated, system died

Tried downgrading systems packages
Tried to reinstall-kernels
Tried lts kernel

Now completely stuck - any ideas?

System:
  Kernel: 6.7.1-arch1-1 arch: x86_64 bits: 64
  Desktop: KDE Plasma v: ERR-101 Distro: EndeavourOS
Machine:
  Type: Laptop System: LENOVO product: 82JY v: Legion 5 17ACH6H serial: <filter>
  Mobo: LENOVO model: LNVNB161216 v: SDK0R32862 WIN serial: <filter> UEFI: LENOVO v: GKCN60WW
    date: 03/07/2023
Battery:
  ID-1: BAT0 charge: 62.6 Wh (75.1%) condition: 83.4/80.0 Wh (104.3%) volts: 15.8 min: 15.4
CPU:
  Info: 8-core model: AMD Ryzen 7 5800H with Radeon Graphics bits: 64 type: MT MCP cache: L2: 4 MiB
  Speed (MHz): avg: 466 min/max: 400/4463 cores: 1: 400 2: 400 3: 400 4: 400 5: 400 6: 400
    7: 1456 8: 400 9: 400 10: 400 11: 400 12: 400 13: 400 14: 400 15: 400 16: 400
Graphics:
  Device-1: NVIDIA GA104M [GeForce RTX 3070 Mobile / Max-Q] driver: nouveau v: kernel
  Device-2: Bison Integrated Camera driver: uvcvideo type: USB
  Display: server: X.org v: 1.21.1.13 with: Xwayland v: 24.1.0 driver: X: loaded: nvidia
    unloaded: modesetting gpu: nouveau resolution: 1920x1080
  API: EGL v: 1.5 drivers: nouveau,swrast platforms: gbm,surfaceless,device
  API: OpenGL v: 4.5 compat-v: 4.3 vendor: mesa v: 24.1.3-arch1.1 note: incomplete (EGL sourced)
    renderer: NV174, llvmpipe (LLVM 18.1.8 256 bits)
  API: Vulkan Message: No Vulkan data available.
Audio:
  Device-1: NVIDIA GA104 High Definition Audio driver: snd_hda_intel
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor driver: N/A
  Device-3: AMD Family 17h/19h HD Audio driver: snd_hda_intel
  API: ALSA v: k6.7.1-arch1-1 status: kernel-api
Network:
  Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet driver: r8169
  IF: eno1 state: down mac: <filter>
  Device-2: Realtek RTL8852AE 802.11ax PCIe Wireless Network Adapter driver: rtw89_8852ae
  IF: wlan0 state: up mac: <filter>
Bluetooth:
  Device-1: Realtek Bluetooth Radio driver: btusb type: USB
  Report: btmgmt ID: hci0 state: up address: <filter> bt-v: 5.2
Drives:
  Local Storage: total: 2.34 TiB used: 315.45 GiB (13.2%)
  ID-1: /dev/nvme0n1 vendor: Samsung model: MZALQ512HBLU-00BL2 size: 476.94 GiB
  ID-2: /dev/nvme1n1 vendor: Crucial model: CT2000P2SSD8 size: 1.82 TiB
  ID-3: /dev/sda vendor: SanDisk model: Ultra Fit size: 57.28 GiB type: USB
Partition:
  ID-1: / size: 458.75 GiB used: 315.23 GiB (68.7%) fs: ext4 dev: /dev/nvme0n1p2
  ID-2: /boot/efi size: 998 MiB used: 228.8 MiB (22.9%) fs: vfat dev: /dev/nvme0n1p1
Swap:
  Alert: No swap data was found.
Sensors:
  System Temperatures: cpu: 54.9 C mobo: N/A
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 16 GiB available: 15.47 GiB used: 2.54 GiB (16.4%)
  Processes: 322 Uptime: 8m Client: systemd inxi: 3.3.35

Updates:

ca-certificates-mozilla 3.101.1-1 -> 3.102-1
ffmpeg 2:7.0.1-1 -> 2:7.0.1-2
kconfig-git 2:6.3.0.r25.gdeaea00-1 -> 2:6.4.0.rc1.r1.g15a4414-1
lib32-krb5 1.21.2-1 -> 1.21.3-1
lib32-nss 3.101.1-1 -> 3.102-1
lib32-sdl2 2.30.4-1 -> 2.30.5-1
lib32-systemd 256.1-1 -> 256.2-1
libplacebo 6.338.2-7 -> 7.349.0-1
libptytty 2.0-4 -> 2.0-5
libuninameslist 20221022-1 -> 20221022-2
libvdpau 1.5-2 -> 1.5-3
libvisual 0.4.2-1 -> 0.4.2-2
lsb-release 2.0.r53.a86f885-1 -> 2.0.r53.a86f885-2
luit 20240102-1 -> 20240102-2
media-player-info 24-2 -> 24-3
netctl 1.29-1 -> 1.29-2
nss-mdns 0.15.1-1 -> 0.15.1-2
nvidia-settings 555.58-1 -> 555.58.02-1
python-validate-pyproject 0.16-1 -> 0.18-1
sdl2 2.30.4-1 -> 2.30.5-1
seatd 0.8.0-1 -> 0.8.0-2
systemd 256.1-1 -> 256.2-1
systemd-libs 256.1-1 -> 256.2-1
systemd-resolvconf 256.1-1 -> 256.2-1
systemd-sysvcompat 256.1-1 -> 256.2-1
wl-clipboard 1:2.2.1-1 -> 1:2.2.1-2
xcb-util-errors 1.0.1-1 -> 1.0.1-2
youtube-dl-git 2021.12.17.r354.37cea84f77-1 -> 2021.12.17.r355.f4b47754d9-1

OK - fsck’d the drive it is a SAMSUNG MZALQ512HBLU-00BL2

Lots of errors, it is a 512gb nvme - is it on the way out?

sudo smartctl /dev/nvme1n1p2 -a                                                  fish  75  20:36:55 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.9.7-arch1-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZALQ512HBLU-00BL2
Serial Number:                      S65DNE1R330632
Firmware Version:                   7L2QFXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 512,110,190,592 [512 GB]
Unallocated NVM Capacity:           0
Controller ID:                      5
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Utilization:            381,693,128,704 [381 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 d31111166d
Local Time is:                      Sat Jul  6 20:37:02 2024 BST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     81 Celsius
Critical Comp. Temp. Threshold:     87 Celsius
Namespace 1 Features (0x10):        NP_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.35W       -        -    0  0  0  0        0       0
 1 +     5.35W       -        -    1  1  1  1        0       0
 2 +     5.35W       -        -    2  2  2  2        0     500
 3 -   0.0500W       -        -    3  3  3  3      210    1200
 4 -   0.0050W       -        -    4  4  4  4     1000    9000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    4%
Data Units Read:                    24,665,414 [12.6 TB]
Data Units Written:                 26,891,214 [13.7 TB]
Host Read Commands:                 310,474,156
Host Write Commands:                287,142,684
Controller Busy Time:               1,095
Power Cycles:                       2,687
Power On Hours:                     375
Unsafe Shutdowns:                   313
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               30 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

Any other tests to run??

Partition:
ID-1: / size: 458.75 GiB used: 315.23 GiB (68.7%) fs: ext4 dev: /dev/nvme0n1p2
ID-2: /boot/efi size: 998 MiB used: 228.8 MiB (22.9%) fs: vfat dev: /dev/nvme0n1p1

Isn’t reinstall-kernels for systemd-boot?

Looking at mountpoint for your ESP you seems to be using Grub.

That was from the live media (should have said, forgot!) Definitely systemd-boot.

1 Like

Did you do this while in arch-chroot?

Yes, I think the disk errors were caused by the repeated need to reboot on the 555 nVidia drivers - even though I tried REISUB, first, every time - it did not work (a lot of the time).

Have run a defrag, just to stress the nvme a bit and it was OK (approx 2.8 million files!).