I started to check the SMART values of my NVME devices. I have three such devices and they all show the same consufing values. Here is an example from my Samsung SSD 970 EVO Plus 1TB:
3# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.14-zen1-1-zen] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO Plus 1TB
Serial Number: S4EWNG0M201473P
Firmware Version: 2B2QEXM7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 1.000.204.886.016 [1,00 TB]
Unallocated NVM Capacity: 0
Controller ID: 4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1.000.204.886.016 [1,00 TB]
Namespace 1 Utilization: 453.251.903.488 [453 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 529150301b
Local Time is: Mon Dec 14 07:16:27 2020 CET
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 85 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.80W - - 0 0 0 0 0 0
1 + 6.00W - - 1 1 1 1 0 0
2 + 3.40W - - 2 2 2 2 0 0
3 - 0.0700W - - 3 3 3 3 210 1200
4 - 0.0100W - - 4 4 4 4 2000 8000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 37 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 2%
Data Units Read: 102.002.681 [52,2 TB]
Data Units Written: 93.645.824 [47,9 TB]
Host Read Commands: 567.984.617
Host Write Commands: 403.730.298
Controller Busy Time: 1.830
Power Cycles: 1.516
Power On Hours: 919
Unsafe Shutdowns: 69
Media and Data Integrity Errors: 0
Error Information Log Entries: 2.085
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 37 Celsius
Temperature Sensor 2: 43 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
The weird thing is that it says:
Data Units Written: 93.645.824 [47,9 TB]
Power On Hours: 919
The drive hosts my root and home directory. I can hardly believe that it has already written 47,9 TB. And that in 919 hours which means 15 MB/s in each second of its lifetime. This is too much from my point of view and it does not align with
Percentage Used: 2%
Any idea how to best interpret these values?
Also very strange:
I bought the Samsung SSD 970 EVO Plus 1TB in March 2019 and I have a Kingston A2000 1TB which I bough in August 2020. That is 17 month later but nevertheless the KINGSTON has
Power On Hours: 1.158
compared to the Samsung
Power On Hours: 919
This can not be. Something is completely odd with these smart values. Any idea?
If you use a computer under control of power management, you should instruct smartd how to handle disks in low power mode. Usually, in response to SMART commands
I have a suspicion regarding the amount of data written: It just came to my mind that I did extensive fio benchmarks with each of the NVME devices with job sizes of 64 GB. That was easily worth several TB.
That sounds like a probable culprit - I just had a look at my data, and I was startled too, on both machines! Here’s the stuff on my 970 EVO plus:
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 30 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 35,797,203 [18.3 TB]
Data Units Written: 3,226,462 [1.65 TB]
Host Read Commands: 310,034,826
Host Write Commands: 82,538,644
Controller Busy Time: 499
Power Cycles: 42
Power On Hours: 754
Unsafe Shutdowns: 8
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 30 Celsius
Temperature Sensor 2: 31 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
18TB read - and almost 2TB written! On a mere 256 drive… Could be related to multiple installs of multi-OS’s…
Yes, this whole thing is till confusing. For me one issue remains. And that is the Power on Hours. My Kingston drive is more than a year younger than the Samsung drive and it is sitting in the same computer, but yet it has 200 hours more on the clock. That is weird.
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 33 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 2%
Data Units Read: 6,021,136 [3.08 TB]
Data Units Written: 9,491,544 [4.85 TB]
Host Read Commands: 38,712,103
Host Write Commands: 112,945,876
Controller Busy Time: 4,080
Power Cycles: 33
Power On Hours: 6,653
Unsafe Shutdowns: 11
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged