SSD failure?

fbodymechanic · April 14, 2022, 3:48pm

Happened turning it on a couple days ago. This was not the result of an update as I was not updating and had been on and off several times before hand. I put it in an external care and I could chroot and update with my other computer but I still get this on boot.

DDG pretty much implies the drive is likely about to be complete dead, but I thought I’d see if any of y’all have experience with this one.

Everything is backed up. Smartctl short test passes.

Samsung Evo 850 drive.

PXL_20220414_152715795

pebcak · April 14, 2022, 4:14pm

It says to run fsck manually. Did you?

fbodymechanic · April 14, 2022, 4:53pm

PXL_20220414_165249255

ricklinux · April 14, 2022, 5:03pm

What does the long test show for smartctl? It takes a bit of time.

sudo smartctl --test=long /dev/xxxx (your drive)

pebcak · April 14, 2022, 5:05pm

I would say Y to everything:

fsck -y /dev/sdb4

For verbose:

fsck -yV

fbodymechanic · April 14, 2022, 5:05pm

It’s several hours if I recall. I’ll try and set it up in a bit.

ricklinux · April 14, 2022, 5:14pm

I think it’s probably software errors (if it passes the long test. Maybe fsck will correct that?

fbodymechanic · April 14, 2022, 5:16pm

So, I’d be lying if I knew what that did but it worked. I watched -yV go for like 2 min straight and now it seems good.

Thank you.

The real question though, is the drive failing? Or what caused that in the first place?

pebcak · April 14, 2022, 5:18pm

You are welcome! Glad it worked!

Let’s hope it is not!
To make sure, run the smartctl test when you find time.

ricklinux · April 14, 2022, 5:19pm

Run the long test and if it comes back good i would say it was a file system error. (corruption)

EOS · April 14, 2022, 5:20pm

Show SMART Attributes. sudo smartctl -a /dev/sdb

fbodymechanic · April 14, 2022, 6:11pm

Long test is running. 4.5 hours to go.

fbodymechanic · April 14, 2022, 10:45pm

I guess it’s good?

[derek@gnome ~]$ sudo smartctl -a /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.33-1-lts] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 850 EVO 500GB
Serial Number:    S2RANX0H808243R
LU WWN Device Id: 5 002538 d412d9ad0
Firmware Version: EMT02B6Q
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Apr 14 15:43:43 2022 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 265) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       2091
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       1297
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       5
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   066   054   000    Old_age   Always       -       34
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   099   099   000    Old_age   Always       -       3
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       90
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       6238878067

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      2091         -
# 2  Short offline       Completed without error       00%      2083         -
# 3  Extended offline    Aborted by host               90%      2062         -
# 4  Short offline       Completed without error       00%      2062         -
# 5  Short offline       Completed without error       00%      1850         -
# 6  Short offline       Completed without error       00%       832         -
# 7  Short offline       Aborted by host               90%       768         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  255        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[derek@gnome ~]$

EOS · April 15, 2022, 5:43am

SMART values indicate that the ssd is in good condition.
And it is written only < 3 TBW

I have two of those same discs. A couple of things therefore make me wonder:

I have not seen this before

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing

On my disks:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

But according to this page it may just be a (fixed) bug:
smartmontools.org

But why does the ‘bug’ appear on your ssd and not on my?
Same disks, same OS and same firmwares…

And the second:

199 CRC_Error_Count 0x003e 099 099 000 Old_age Always - 3
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 90

The latter can perhaps explain the former.
POR_Recovery_Count = Unexpected Power Loss, is quite high in relation to operating hours.
And they may have caused those harmless crc errors.

fbodymechanic · April 15, 2022, 11:35am

Maybe it’s because it’s a laptop? Are you using yours in a desktop or laptop?

anon13373109 · April 15, 2022, 11:53am

199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       478

Mine says 478 which I find weird cause I’m pretty sure there wasn’t some power loss every couple of days. It’s an external ssd I use for my Pi 4b server which I reboot maybe once every 2 weeks.

I’m not the only one:

EOS · April 15, 2022, 3:17pm

Desktop.

EOS · April 15, 2022, 4:24pm

POR Recovery Count
A count of the number of sudden power off cases. If there is a
sudden power off, the firmware must recover all of the mapping
and user data during the next power on. This is a count of the
number of times this has happened.
Samsung SSD Note

Maybe some power saving will increase the figure +1. I have no experience with Pi.