Aaaand..... I destroyed my SSD (or Plasma did, or WTF?)

linuxislife · February 20, 2025, 12:44pm

So,

I updated this computer running eos w/ kde, went “well”, rebooted, don’t ask me why but I started a
# smartctl -t long /dev/nvme0 (I did a -short just before the update and it went well)
At this point I wanted to launch firefox, realized all the icons on the task bar were frozen, was like, ok here we go for plasma update… started a terminal… the typing froze… then my mouse froze… well, a nice and sweet RESIUB that went as it should have (I assume)… and

boom

first reboot, the BIOS didn’t even know there was an ssd

so, turned off and unplugged everything

BIOS found the SSD, didn’t wanna do anything with it.

well, Live ISO ----> hangs forever on
A start job is running for Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progess polling
throw some errors (couldn’t really get it, but about nvme, I think…)
boot, no nvme mountable in the file manager, lsblk → just the usb key

took out the drive to try in an another PC (should have gotten the log of the live iso boot first )

and here I am.

PC with nothing in it but that very nvme ssd and the live ISO running on flash drive.

I dunno what messages showed up when booting the iso (but computer still on if there’s any logs somewhere…) and ssd nvme not showing in the file manager, but lsblk shows nvme0n1 and all its partitions.

Soooooo…

well, first, WTF ??? (I mean I thought S.M.A.R.T stuff were smarter than me , cause, well, I assume it has something to do with the test, could it be otherwise?)

and, if anybody has any idea how to try something (at least just to get one or two files in there that aren’t backed up… of course…), and then (but maybe only after, I mean if there’s any attempt at getting some data without destroying it more) to know if it’s burned forever or… what?!

Thanks so much in advance

keescase · February 20, 2025, 1:12pm

Well the only thing I can tell you that I have always been reading that if you suspect your drive is damaged, you never should test it before you backed up important data on it. Because if the drive is damaged already, you would run the risc of damaging the drive even more. Maybe that’s what happened here. Maybe this utility can help you further https://www.system-rescue.org

rabcor · February 20, 2025, 1:12pm

So you did that and the drive broke down while running it? I’d think that’s quite telling, drive was already bad most likely.

Run it again in this other pc where it is not the root drive, see what it tells you.

I trust that you know to use smartctl -a /dev/nvmex to see the results

dalto · February 20, 2025, 1:32pm

Sounds like a hardware failure. Either drive or controller.

If lsblk shows it, you could try mounting it with the mount command. If it is encrypted, you will need to open the encrypted volume first.

linuxislife · February 20, 2025, 1:56pm

it is encrypted

I will try to mount it (i will first try to know how to )

but it doesn’t show at all in gparted… not sure it’s a good sign…

can’t, it’s throwing an error

I didn’t suspect anything, and the short test was happy… the worst is, I “tested” because I was testing and setting up HDDs on my other machine in the process of implementing a real back up system… and I was like, hmmm let’s see what this ssd has to tell me while I’m there… well
And of course, just backing up without suspecting anything is the real rule that’s kind of what I was gonna do next

thefrog · February 20, 2025, 2:11pm

First if you suspect a drive is going bad STOP USING IT.

The more you use it the less likely chance of recovery. There are recovery tools out there like photorec that can retrieve the files from the disk. Note that again you only want to put minimal access to such device until you have it backed up if it is damaged. Once backed up you could run programs like testdisk and photorec. Note these are powerful tools that can do as much damage to the device as fix it so USER CAUTION ADVISED. Best to take it to a professional if you really need to recover and you have no personal experience.

dalto · February 20, 2025, 2:24pm

Can you share the output of lsblk -f

linuxislife · February 20, 2025, 2:34pm

I’ll try, the machine is on is off the grid.

But I think lsblk just shows vnme0n1, the tree of partitions, ans that’s it, I mean no label, no UUID no nothing else

I go have a look

dalto · February 20, 2025, 2:46pm

Make sure you pass the -f option.

linuxislife · February 20, 2025, 2:49pm

no sorry, I mean with the -f option

i’ll log journalctl and dmesg on a key if that’d help…

so:

$ lsblk -f
NAME       FSTYPE   FSVER     LABEL       UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
loop0      squashfs 4.0                                                              0   100% /run/archiso/airootfs
sda        iso9660  Joliet Ex EOS_202502  2025-02-08-08-06-14-00
├─sda1     iso9660  Joliet Ex EOS_202502  2025-02-08-08-06-14-00                     0   100% /run/archiso/bootmnt
└─sda2     vfat     FAT32     ARCHISO_EFI 67A7-1076
sdb
└─sdb1     vfat     FAT32     CES_X64FREV 0001-D16F                              10.3G    31% /run/media/liveuser/CES_X64FREV
nvme0n1
├─nvme0n1p1
│
├─nvme0n1p2
│
├─nvme0n1p3
│
├─nvme0n1p4
│
└─nvme0n1p5

linuxislife · February 20, 2025, 3:06pm

journalctl

https://pastebin.com/gQUXMW9E

dmesg

https://pastebin.com/FUUYg5RL

I dunno if it’s totally toasted or what…

edit:
And, on a side note for future reference (or for my conscience idk ^^) is it possible to know if :

it is the fact that the computer bugged (because of the update for example) and because I REISUB it while a “smart” test was ongoing that it went south
or
the test stepped right into some sh*t, hence why it exploded (i.e. it’s not “because” of the reboot during test)

rabcor · February 20, 2025, 3:26pm

It shouldn’t matter if it’s encrypted, unless it’s hardware encrypted which is unlikely.

dalto · February 20, 2025, 3:33pm

That looks like failed hardware.

It probably isn’t anything you did to cause it. Sometimes hardware just fails.

linuxislife · February 20, 2025, 3:36pm

.

thefrog · February 20, 2025, 3:36pm

<3>[ 188.197686] Buffer I/O error on dev nvme0n1, logical block 122096624, async page read
<3>[ 188.202762] Buffer I/O error on dev nvme0n1p1, logical block 255984, async page read
<3>[ 188.202764] Buffer I/O error on dev nvme0n1p4, logical block 255984, async page read
<3>[ 188.202828] Buffer I/O error on dev nvme0n1p3, logical block 44211184, async page read
<3>[ 188.202834] Buffer I/O error on dev nvme0n1p2, logical block 32752, async page read
<3>[ 188.202926] Buffer I/O error on dev nvme0n1p5, logical block 618690432, async page read
<3>[ 188.202929] Buffer I/O error on dev nvme0n1p5, logical block 618690433, async page read
<3>[ 188.202931] Buffer I/O error on dev nvme0n1p5, logical block 618690434, async page read
<3>[ 188.202932] Buffer I/O error on dev nvme0n1p5, logical block 618690435, async page read
<3>[ 188.202934] Buffer I/O error on dev nvme0n1p5, logical block 618690436, async page read

As pointed out by @dalto looks like a disk failure issue. You should replace it.

linuxislife · February 20, 2025, 3:38pm

yep I totally saw those…

buggerrrrrrrrrrrrr

rabcor · February 20, 2025, 3:45pm

Run the smart test properly anyways, it’s possible that only your partition table got corrupted (as opposed to your ssd actually failing)

also try smartctl -H /dev/nvmex which is the basic check to see if the hard drive has completely failed.

It is a bit of a long shot, but it’s possible.

linuxislife · February 20, 2025, 3:57pm

no no but like, i mean, it really won’t

[liveuser@eos-2025.02.08 ~]$ sudo smartctl -a /dev/nvme0n1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.13.1-arch2-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error

[liveuser@eos-2025.02.08 ~]$ sudo smartctl -H /dev/nvme0n1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.13.1-arch2-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error

[liveuser@eos-2025.02.08 ~]$ sudo smartctl -H /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.13.1-arch2-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/nvme0 failed: Resource temporarily unavailable

MyNameIsRichard · February 20, 2025, 3:59pm

Yeah, it really seems like a hardware failure. Is it new enough to get a warranty exchange?

linuxislife · February 20, 2025, 4:02pm

probably not, and really it’s more about the data (always, right?)…

The worst part is at the result of the -short test I was like cool, this drive is brand new, it was like 5000h up, 7TB written… mmmmèèèèèhhhh