Can anyone advise on how I interpret the SMART SSD Wearout Indicator for this disk?
Is it 100% good (hence the Health Status is 'OK') or is it 100% worn out (if so, I think the Health Status should showing as at least a 'Warning'?)
./smartinfo script output from ESXi host /usr/lib/vmware/vm-support/bin/ |
---|
SMART Data for Disk : t10.ATA_____SanDisk_SDSSDXPS480G____________________160933402734________ Parameter Value Threshold Worst ----------------------------------------------------- Health Status OK N/A N/A Media Wearout Indicator 100 0 100 Write Error Count N/A N/A N/A Read Error Count N/A N/A N/A Power-on Hours 253 0 100 Power Cycle Count 100 0 100 Reallocated Sector Count 100 0 100 Raw Read Error Rate N/A N/A N/A Drive Temperature 73 0 36 Driver Rated Max Temperature N/A N/A N/A Write Sectors TOT Count 253 0 253 Read Sectors TOT Count 253 0 253 Initial Bad Block Count N/A N/A N/A ----------------------------------------------------- |
I am getting a weird VMFS issue with this disk where I have files and directories that I cannot delete.
Thanks
M
Thanks to Florian #fgrehl at Virten.net for posting this blog: Determine TBW from SSDs with S.M.A.R.T Values in ESXi (smartctl) | Virten.net
As per the blog, I installed the unsupported smartctl vib (smartctl-6.6-4321.x86_64.vib) on my ESXi 6.5 U1 lab host and you get so much more usable detail about the SSD disks. Why is this utility not included in the base ESXi code (Come on VMware people have a nice chat with the people at https://www.smartmontools.org/wiki and lets have a maintained & supported .vib version please?)
./smartctl -d sat --all /dev/disks/t10.ATA_____SanDisk_SDSSDXPS480G____________________160933402734________ command output from ESXi Host /opt/smartmontools |
---|
smartctl 6.6 2016-05-10 r4321 [x86_64-linux-6.5.0] (daily-20160510) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Marvell based SanDisk SSDs Device Model: SanDisk SDSSDXPS480G Serial Number: 160933402734 LU WWN Device Id: 5 001b44 4a49bca08 Firmware Version: X21200RL User Capacity: 480,103,981,056 bytes [480 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sun Mar 25 20:34:56 2018 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status not supported: Incomplete response, ATA output registers missing SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x11) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 10) minutes. SMART Attributes Data Structure revision number: 4 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 --- Old_age Always - 0 9 Power_On_Hours 0x0032 253 100 --- Old_age Always - 936 12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 142 166 Min_W/E_Cycle 0x0032 100 100 --- Old_age Always - 1 167 Min_Bad_Block/Die 0x0032 100 100 --- Old_age Always - 46 168 Maximum_Erase_Cycle 0x0032 100 100 --- Old_age Always - 182 169 Total_Bad_Block 0x0032 100 100 --- Old_age Always - 839 171 Program_Fail_Count 0x0032 100 100 --- Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 --- Old_age Always - 0 173 Avg_Write/Erase_Count 0x0032 100 100 --- Old_age Always - 35 174 Unexpect_Power_Loss_Ct 0x0032 100 100 --- Old_age Always - 138 184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0 188 Command_Timeout 0x0032 100 100 --- Old_age Always - 0 194 Temperature_Celsius 0x0022 072 036 --- Old_age Always - 28 (Min/Max 20/36) 199 SATA_CRC_Error 0x0032 100 100 --- Old_age Always - 0 212 SATA_PHY_Error 0x0032 100 100 --- Old_age Always - 0 230 Perc_Write/Erase_Count 0x0032 100 100 --- Old_age Always - 272 232 Perc_Avail_Resrvd_Space 0x0033 100 100 004 Pre-fail Always - 100 233 Total_NAND_Writes_GiB 0x0032 100 100 --- Old_age Always - 19001 241 Total_Writes_GiB 0x0030 253 253 --- Old_age Offline - 14613 242 Total_Reads_GiB 0x0030 253 253 --- Old_age Offline - 3581 244 Thermal_Throttle 0x0032 000 100 --- Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Selective Self-tests/Logging not supported |
I have written 14.27 TiB in 936 hours or 0.015246269 TiB/Hour Average
The SanDisk Extreme Pro SATA SSD has a 10 year warranty with a 80 TBW endurance which is only a 0.000913242 TiB/Hour Average.
I am now kinda concerned that at this rate I could blow through my warranty in the next 219 days!
Now thinking my VMFS issue might be down to this accelerated usage.
Looks like I am going to have to evacuate all VM's off the datastore and either run a VOMA check/fix or reformat the VMFS. (More writes either way )
M