VMware Cloud Community
KKSAdmin
Enthusiast
Enthusiast

Does VSAN collect health metrics for NON-passthrough disks?

Let's say you have an older controller that doesn't support pass-through and you are creating virtual RAID 0 disks from the controller.

Is VSAN still able to collect SMART metrics (i.e. predictive failure, etc) information for those disks?

For example, if I do an esxcli storage core device list, I only see information for the virtual disk the controller is presenting and not the actual physical disk.  Is VSAN Health Check also limited to the virtual disk presentation? 

Reply
0 Kudos
2 Replies
TheBobkin
Champion
Champion

Hello KKSAdmin,

"Is VSAN still able to collect SMART metrics (i.e. predictive failure, etc) information for those disks?"

I think it may actually vary depending on the controller and associated capabilities/drivers/plug-ins for some aspects but it may depend more on what type of "failure" we are talking about and potentially even the vSAN version in use as DDH (Dying Disk Handling) has (thankfully!) always been there in some shape or form but it does vary:

https://kb.vmware.com/s/article/2148358

Typically SCSI Sense codes from the controller (referring naa's and/or mount point (mpx.vmhbax:Cx:Tx:Lx) will of course still function and inform in vmkernel.log for issues with devices.

Also, something such as device latency VOB is agnostic of how the device is attached and thus will of course inform if a device is not responding in a timely manner.

I meant to ask you on your other post what you meant by "The VSAN Health service appears to be fine other than that they don't seem to be alarming on bad drives. "

Which Health check info is shown as warning/error and what is the info in the drop-down?

Bob

Reply
0 Kudos
KKSAdmin
Enthusiast
Enthusiast

Well the problem is sort of that we are NOT seeing any physical disk errors reported in the VSAN Health Service.

We've had to replace probably 3 drives in the past quarter and it seems our means for doing so is to investigate which disk group and then ultimately which disk has the bad performance metrics.  We never get any indication in the VSAN Health Service or vROPS. I just went through some basic queries in Log Insight (just using defined VSAN fields for "exists") and found maybe a handful of events over the past week (our week was more eventful).

So it seems to me that somehow the metrics we want aren't getting collected at some level.  That's why I asked the question above because when you list drives in ESXCLI you just see the virtual drives presented by the controller and not the actual drive itself.  So this is me trying to understand why I'm not seeing more in the away of alarms/logs/metrics specifically for physical disk status.

Most other metrics and alarms seem to be fine, but I'm just not seeing anything for the physical disks.

Reply
0 Kudos