Hello Everyone,
I need assistance on vSAN alert.
On one of the Cluster we are getting an error as, Virtual SAN device is under permanent failure.
- Failed : Physical disk
- Failed : Component metadata health
- Failed : Overall disks health
I have gone through with couple of KBs and community.
VSAN health check - component metadata health
Component metadata health check fails with invalid state error (2145347) | VMware KB
ESXi host :
VMware ESXi 6.0.0 build-3620759
VMware ESXi 6.0.0 Update 2
vSAN Version:
Name : VMware-vsan-health Relocations: (not relocatable)
Version : 6.2.0 Vendor: VMware, Inc.
Release : 3547697 Build Date: Sat Feb 13 03:04:16 2016
Install Date: Thu Oct 13 18:12:01 2016 Build Host: sc-bld-lin1268.eng.vmware.com
Group : Applications/Management Source RPM: VMware-vsan-health-6.2.0-3547697.src.rpm
Size : 52872114 License: commercial
Signature : (none)
Summary : VMware Virtual SAN Health Service
Description :
VMware Virtual SAN Health Service
Distribution: (none)
vmkernel.log
2017-04-24T10:17:07.853Z cpu16:42460)PLOG: PLOG_QuiesceDevice:8531: : Got quiesce reason 1 on disk naa.600605b00991a3f0202de2c45f900beb:2 5296f94a-d540-efa9-e0e4-d7a2788d97ce
2017-04-24T10:17:07.853Z cpu7:33656)PLOG: PLOG_CleanupElevator:1473: Waiting for Elevator from UUID 5296f94a-d540-efa9-e0e4-d7a2788d97ce
2017-04-24T10:17:07.863Z cpu32:2341680)WARNING: LSOM: LSOMEventNotify:6450: Virtual SAN device 5296f94a-d540-efa9-e0e4-d7a2788d97ce has gone offline.
2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOGGarbageCollectDevice:1542: Throttled: Device naa.600605b00991a3f0202de2c45f900beb:1 5296f94a-d540-efa9-e0e4-d7a2788d97ce is prepared to delete
2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOG_FreeDevice:325: PLOG in-mem device 0x430cdf26f030 naa.600605b00991a3f0202de2c45f900beb:1 0x419 5296f94a-d540-efa9-e0e4-d7a2788d97ce is being freed SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8
2017-04-24T10:17:09.867Z cpu9:33662)PLOG: PLOG_FreeDevice:325: PLOG in-mem device 0x430cdf270070 naa.600605b00991a3f0202de2c45f900beb:2 0x41d 5296f94a-d540-efa9-e0e4-d7a2788d97ce is being freed SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8
2017-04-24T10:17:11.369Z cpu36:41665)PLOG: PLOGNotifyDisks:4010: MD 3 with UUID 5296f94a-d540-efa9-e0e4-d7a2788d97ce with state 0 formatVersion 4 backing SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8 notified
2017-04-24T10:17:11.418Z cpu0:7034782)PLOG: PLOGGetRecoveredState:6637: Last LSN recoverd 5296f94a-d540-efa9-e0e4-d7a2788d97ce 46544828
2017-04-24T10:17:12.421Z cpu0:7034782)PLOG: PLOG_OpenDevHandles:1228: Registered APD callback for naa.600605b00991a3f0202de2c45f900beb:2 5296f94a-d540-efa9-e0e4-d7a2788d97ce
2017-04-24T10:17:12.424Z cpu0:7034782)PLOG: PLOG_OpenDevHandles:1228: Registered APD callback for naa.600605b00991a3f0202de2c45f900beb:2 5296f94a-d540-efa9-e0e4-d7a2788d97ce
2017-04-24T10:17:12.425Z cpu0:7034782)PLOG: PLOGInitAndAnnounceMD:6987: Successfully announced VSAN MD (naa.600605b00991a3f0202de2c45f900beb:2) with UUID 5296f94a-d540-efa9-e0e4-d7a2788d97ce
2017-04-24T10:17:12.530Z cpu26:43820)WARNING: LSOM: LSOMEventNotify:6440: Virtual SAN device 5296f94a-d540-efa9-e0e4-d7a2788d97ce is under permanent error.
2017-04-24T10:17:07.853Z cpu8:7034742)PLOG: PLOGValidateDiskGroupOpFn:1415: Issuing PLOG Op DISKGROUP UNMOUNT for MD :naa.600605b00991a3f0202de2c45f900beb
2017-04-24T10:17:07.853Z cpu16:42460)PLOG: PLOG_QuiesceDevice:8531: : Got quiesce reason 1 on disk naa.600605b00991a3f0202de2c45f900beb:2 5296f94a-d540-efa9-e0e4-d7a2788d97ce
2017-04-24T10:17:07.853Z cpu32:41665)LSOM: LSOMEventNotify:6413: Throttled: Waiting for component cleanup
2017-04-24T10:17:07.853Z cpu7:33656)PLOG: PLOG_CleanupElevator:1473: Waiting for Elevator from UUID 5296f94a-d540-efa9-e0e4-d7a2788d97ce
2017-04-24T10:17:07.863Z cpu32:2341680)WARNING: LSOM: LSOMEventNotify:6450: Virtual SAN device 5296f94a-d540-efa9-e0e4-d7a2788d97ce has gone offline.
2017-04-24T10:17:07.863Z cpu32:2341680)LSOM: LSOMEventNotify:6519: Throttled: Waiting for open component countto drop to zero
2017-04-24T10:17:07.872Z cpu29:36378)PLOG: PLOGIsPlogUnloading:100: Elevator exit for device is set
2017-04-24T10:17:07.872Z cpu29:36378)PLOG: PLOGElevBaseHandler:617: Elevator exiting due to unload operation
2017-04-24T10:17:07.974Z cpu8:33711)Global: Virsto_DetachInstance:301: INFO: Detaching Virsto Instance 0x430b680a9060 from PLOG device
2017-04-24T10:17:08.855Z cpu21:33659)PLOG: PLOG_CleanupDefence:6346: Waiting for defence task for naa.600605b00991a3f0202de2c45f900beb:1
2017-04-24T10:17:09.856Z cpu21:33659)Destroyed VSAN Slab PLOGIORetry_slab_0000000000 (maxCount=0 failCount=0)
2017-04-24T10:17:09.857Z cpu21:33659)Destroyed VSAN Slab PLOGIORetry_slab_0000000001 (maxCount=1 failCount=0)
2017-04-24T10:17:09.857Z cpu21:33659)ScsiEvents: 353: EventSubsystem: Device Events, Event Mask: 20, Parameter: 0x430cdde547e0, UnRegistered!
2017-04-24T10:17:09.857Z cpu3:7034742)PLOG: PLOGValidateDiskGroupOpFn:1415: Issuing PLOG Op DISKGROUP UNMOUNT for MD :naa.600605b00991a3f0202de2c45f900beb
2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOGGarbageCollectDevice:1542: Throttled: Device naa.600605b00991a3f0202de2c45f900beb:1 5296f94a-d540-efa9-e0e4-d7a2788d97ce is prepared to delete
2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOG_FreeDevice:325: PLOG in-mem device 0x430cdf26f030 naa.600605b00991a3f0202de2c45f900beb:1 0x419 5296f94a-d540-efa9-e0e4-d7a2788d97ce is being freed SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8
2017-04-24T10:17:09.857Z cpu4:33662)PLOG: PLOG_FreeDevice:496: Throttled: Waiting for ops to complete on device: 0x430cdf26f030 naa.600605b00991a3f0202de2c45f900beb:1
2017-04-24T10:17:09.867Z cpu9:33662)PLOG: PLOG_FreeDevice:325: PLOG in-mem device 0x430cdf270070 naa.600605b00991a3f0202de2c45f900beb:2 0x41d 5296f94a-d540-efa9-e0e4-d7a2788d97ce is being freed SSD 52cec8b9-4703-a9ad-aa5b-eaccb9b6f0e8
2017-04-24T10:17:09.867Z cpu9:33662)PLOG: PLOG_FreeDevice:454: Unregistering diskAttrHandle:0x430cdf2708b0 on disk naa.600605b00991a3f0202de2c45f900beb
2017-04-24T10:17:09.867Z cpu9:33662)LSOMCommon: LSOM_UnregisterDiskAttrHandle:136: DiskAttrHandle:0x430cdf2708b0 is removed from moduleID 86 for disk:naa.600605b00991a3f0202de2c45f900beb
2017-04-24T10:17:09.868Z cpu9:33662)Destroyed VSAN Slab PLOGIORetry_slab_0000000000 (maxCount=26 failCount=0)
2017-04-24T10:17:09.868Z cpu9:33662)Destroyed VSAN Slab PLOGIORetry_slab_0000000001 (maxCount=9 failCount=0)
2017-04-24T10:17:09.868Z cpu9:33662)ScsiEvents: 353: EventSubsystem: Device Events, Event Mask: 20, Parameter: 0x430cdf2720d0, UnRegistered!
2017-04-24T10:17:09.906Z cpu28:33528)WARNING: DVFilter: 1181: Couldn't enable keepalive: Not supported
2017-04-24T10:17:09.982Z cpu46:7034760)VSAN Device Monitor: Successfully unmounted failed VSAN disk naa.600605b00991a3f0202de2c45f900beb
Regards,
Ali
Greetings!
This is a drive failure case and you need to replace the faulted drive.
______________________
Was your question answered correctly? If so, please remember to mark your question as "Correct" or "Helpful" when you get the appropriate answer. This helps others searching for a similar issue.
Cheers!
Shivam
Check if the device in question is shown as predictive failure or failed in hardware logs. Replace the disk if you see errors at hardware level.
Ensure the firmware of the devices are supported for vSAN as per VMWare HCL and update them if required.
as mentioned above.
you need to replace drive. but make sure you follow step.
VMware Virtual SAN Operations: Replacing Disk Devices - Virtual Blocks - VMware Blogs
Login to the vSphere Web Client
Navigate to the Hosts and Clusters view and select the Virtual SAN enabled cluster
Go to the manage tab and select Disk management under the Virtual SAN section
Select the disk group with the failed magnetic device
Select the failed magnetic device and click the delete button
take out failed drive from your host and replace it. make sure esxi detected new drive, than re-add newly replace drive to disk group
from your screenshot, you are using pass-through configuration so that you don't need extra step for raid 0 device. above step will be enough.