VMware Cloud Community
mc1903cae
Enthusiast
Enthusiast

SMART SSD 'Media Wearout Indicator'

Can anyone advise on how I interpret the SMART SSD Wearout Indicator for this disk?

Is it 100% good (hence the Health Status is 'OK') or is it 100% worn out (if so, I think the Health Status should showing as at least a 'Warning'?)

./smartinfo script output from ESXi host /usr/lib/vmware/vm-support/bin/

SMART Data for Disk : t10.ATA_____SanDisk_SDSSDXPS480G____________________160933402734________

Parameter                     Value  Threshold  Worst

-----------------------------------------------------

Health Status                   OK      N/A     N/A

Media Wearout Indicator         100     0       100

Write Error Count               N/A     N/A     N/A

Read Error Count                N/A     N/A     N/A

Power-on Hours                  253     0       100

Power Cycle Count               100     0       100

Reallocated Sector Count        100     0       100

Raw Read Error Rate             N/A     N/A     N/A

Drive Temperature               73      0       36

Driver Rated Max Temperature    N/A     N/A     N/A

Write Sectors TOT Count         253     0       253

Read Sectors TOT Count          253     0       253

Initial Bad Block Count         N/A     N/A     N/A

-----------------------------------------------------

I am getting a weird VMFS issue with this disk where I have files and directories that I cannot delete.

Thanks

M

0 Kudos
1 Reply
mc1903cae
Enthusiast
Enthusiast

Thanks to Florian #fgrehl at Virten.net for posting this blog: Determine TBW from SSDs with S.M.A.R.T Values in ESXi (smartctl) | Virten.net

As per the blog, I installed the unsupported smartctl vib (smartctl-6.6-4321.x86_64.vib) on my ESXi 6.5 U1 lab host and you get so much more usable detail about the SSD disks. Why is this utility not included in the base ESXi code (Come on VMware people have a nice chat with the people at https://www.smartmontools.org/wiki and lets have a maintained & supported .vib version please?)

./smartctl -d sat --all /dev/disks/t10.ATA_____SanDisk_SDSSDXPS480G____________________160933402734________ command output from ESXi Host /opt/smartmontools

smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.5.0] (daily-20160510)

Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Marvell based SanDisk SSDs

Device Model:     SanDisk SDSSDXPS480G

Serial Number:    160933402734

LU WWN Device Id: 5 001b44 4a49bca08

Firmware Version: X21200RL

User Capacity:    480,103,981,056 bytes [480 GB]

Sector Size:      512 bytes logical/physical

Rotation Rate:    Solid State Device

Form Factor:      2.5 inches

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ACS-2 T13/2015-D revision 3

SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is:    Sun Mar 25 20:34:56 2018 UTC

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART Status not supported: Incomplete response, ATA output registers missing

SMART overall-health self-assessment test result: PASSED

Warning: This result is based on an Attribute check.

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (    0) seconds.

Offline data collection

capabilities:                    (0x11) SMART execute Offline immediate.

                                        No Auto Offline data collection support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        No Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   253   100   ---    Old_age   Always       -       936

12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       142

166 Min_W/E_Cycle           0x0032   100   100   ---    Old_age   Always       -       1

167 Min_Bad_Block/Die       0x0032   100   100   ---    Old_age   Always       -       46

168 Maximum_Erase_Cycle     0x0032   100   100   ---    Old_age   Always       -       182

169 Total_Bad_Block         0x0032   100   100   ---    Old_age   Always       -       839

171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       0

172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0

173 Avg_Write/Erase_Count   0x0032   100   100   ---    Old_age   Always       -       35

174 Unexpect_Power_Loss_Ct  0x0032   100   100   ---    Old_age   Always       -       138

184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0

187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0

188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       0

194 Temperature_Celsius     0x0022   072   036   ---    Old_age   Always       -       28 (Min/Max 20/36)

199 SATA_CRC_Error          0x0032   100   100   ---    Old_age   Always       -       0

212 SATA_PHY_Error          0x0032   100   100   ---    Old_age   Always       -       0

230 Perc_Write/Erase_Count  0x0032   100   100   ---    Old_age   Always       -       272

232 Perc_Avail_Resrvd_Space 0x0033   100   100   004    Pre-fail  Always       -       100

233 Total_NAND_Writes_GiB   0x0032   100   100   ---    Old_age   Always       -       19001

241 Total_Writes_GiB        0x0030   253   253   ---    Old_age   Offline      -       14613

242 Total_Reads_GiB         0x0030   253   253   ---    Old_age   Offline      -       3581

244 Thermal_Throttle        0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

I have written 14.27 TiB in 936 hours or 0.015246269 TiB/Hour Average

The SanDisk Extreme Pro SATA SSD has a 10 year warranty with a 80 TBW endurance which is only a 0.000913242 TiB/Hour Average.

I am now kinda concerned that at this rate I could blow through my warranty in the next 219 days!

Now thinking my VMFS issue might be down to this accelerated usage.

Looks like I am going to have to evacuate all VM's off the datastore and either run a VOMA check/fix or reformat the VMFS. (More writes either way Smiley Sad)

M

0 Kudos