Nvme confusing SMART values

I started to check the SMART values of my NVME devices. I have three such devices and they all show the same consufing values. Here is an example from my Samsung SSD 970 EVO Plus 1TB:

3# smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.14-zen1-1-zen] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      S4EWNG0M201473P
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1.000.204.886.016 [1,00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1.000.204.886.016 [1,00 TB]
Namespace 1 Utilization:            453.251.903.488 [453 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 529150301b
Local Time is:                      Mon Dec 14 07:16:27 2020 CET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        37 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    2%
Data Units Read:                    102.002.681 [52,2 TB]
Data Units Written:                 93.645.824 [47,9 TB]
Host Read Commands:                 567.984.617
Host Write Commands:                403.730.298
Controller Busy Time:               1.830
Power Cycles:                       1.516
Power On Hours:                     919
Unsafe Shutdowns:                   69
Media and Data Integrity Errors:    0
Error Information Log Entries:      2.085
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               37 Celsius
Temperature Sensor 2:               43 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

The weird thing is that it says:

Data Units Written:                 93.645.824 [47,9 TB]
Power On Hours:                     919

The drive hosts my root and home directory. I can hardly believe that it has already written 47,9 TB. And that in 919 hours which means 15 MB/s in each second of its lifetime. This is too much from my point of view and it does not align with

Percentage Used: 2%

Any idea how to best interpret these values?

Also very strange:

I bought the Samsung SSD 970 EVO Plus 1TB in March 2019 and I have a Kingston A2000 1TB which I bough in August 2020. That is 17 month later but nevertheless the KINGSTON has

Power On Hours: 1.158

compared to the Samsung

Power On Hours: 919

This can not be. Something is completely odd with these smart values. Any idea?

Did you compare the data with gsmartcontrol ? Did you use the Power management on your machine ?

Check on Arch:

If you use a computer under control of power management, you should instruct smartd how to handle disks in low power mode. Usually, in response to SMART commands

I have a suspicion regarding the amount of data written: It just came to my mind that I did extensive fio benchmarks with each of the NVME devices with job sizes of 64 GB. That was easily worth several TB. :blush:

That sounds like a probable culprit - I just had a look at my data, and I was startled too, on both machines! Here’s the stuff on my 970 EVO plus:

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    35,797,203 [18.3 TB]
Data Units Written:                 3,226,462 [1.65 TB]
Host Read Commands:                 310,034,826
Host Write Commands:                82,538,644
Controller Busy Time:               499
Power Cycles:                       42
Power On Hours:                     754
Unsafe Shutdowns:                   8
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               30 Celsius
Temperature Sensor 2:               31 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

18TB read - and almost 2TB written! On a mere 256 drive… Could be related to multiple installs of multi-OS’s… :grin:

Yes, this whole thing is till confusing. For me one issue remains. And that is the Power on Hours. My Kingston drive is more than a year younger than the Samsung drive and it is sitting in the same computer, but yet it has 200 hours more on the clock. That is weird.

So the command in terminal is smartctl -a /dev/nvme0n1 to get the info on my nvme drive?

Yes

1 Like

Thanks!

[dad@archlinux ~]$ smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.14-arch1-1] (local build)
Copyright © 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/nvme0n1 failed: Permission denied
[dad@archlinux ~]$

:thinking:

sudo ?

Haha of course, silly me lmao. :crazy_face: :rofl:

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.14-arch1-1] (local build)
Copyright © 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPEKNW512G8
Serial Number: BTNH94041FQV512A
Firmware Version: 002C
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon Dec 14 12:45:51 2020 EST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 77 Celsius
Critical Comp. Temp. Threshold: 80 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.50W - - 0 0 0 0 0 0
1 + 2.70W - - 1 1 1 1 0 0
2 + 2.00W - - 2 2 2 2 0 0
3 - 0.0250W - - 3 3 3 3 5000 5000
4 - 0.0040W - - 4 4 4 4 5000 9000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 33 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 2%
Data Units Read: 6,021,136 [3.08 TB]
Data Units Written: 9,491,544 [4.85 TB]
Host Read Commands: 38,712,103
Host Write Commands: 112,945,876
Controller Busy Time: 4,080
Power Cycles: 33
Power On Hours: 6,653
Unsafe Shutdowns: 11
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged

[dad@archlinux ~]$

Heres mine to compare with.

[ricklinux@eos-xfce ~]$ sudo smartctl -a /dev/nvme0n1
[sudo] password for ricklinux: 
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.9.13-arch1-1] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WDS500G3X0C-00SJG0
Serial Number:                      183933806864
Firmware Version:                   102000WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 500,107,862,016 [500 GB]
Unallocated NVM Capacity:           0
Controller ID:                      8215
Number of Namespaces:               1
Namespace 1 Size/Capacity:          500,107,862,016 [500 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 8b44b67f66
Local Time is:                      Mon Dec 14 12:53:56 2020 EST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.50W       -        -    0  0  0  0        0       0
 1 +     3.50W       -        -    1  1  1  1        0       0
 2 +     3.00W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3     4000   10000
 4 -   0.0025W       -        -    4  4  4  4     4000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    6,739,002 [3.45 TB]
Data Units Written:                 16,290,017 [8.34 TB]
Host Read Commands:                 86,870,309
Host Write Commands:                82,131,896
Controller Busy Time:               79
Power Cycles:                       755
Power On Hours:                     2,306
Unsafe Shutdowns:                   108
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged

[ricklinux@eos-xfce ~]$ 

I guess i win with highest unsafe shutdowns? That’s what happens when virtualbox modules are messed up or you try to use Deepin . :rofl:

2 Likes