Are you able to see the graph-data from the vSAN Performance graphs (which tell things such as congestion) under Cluster/Host > Monitor > Performance?
If these and the Health checks for things such as Disk Balance (Cluster > Monitor > vSAN > Health) are functional then it is likely that the problem is on the vROPS side and you should consider moving this question to the vROPS sub-community (or asking a Mod to do so).
If these do not show the expected data then there are a few low-hanging fruit troubleshooting steps you can take such as restarting vsanmgmtd on the nodes, restarting vSAN Health and Performance services on the vCenter and ensuring you have any vendor-specific plug-ins/vibs required for monitoring drives and other hardware components.
Thanks. The VSAN Health service appears to be fine other than that they don't seem to be alarming on bad drives. These symptoms persist after reboot.
I will move this to the vROPS area. Thanks.
Could you please share a screen shot of the dashboard which is problematic or shows nothing ?
Also confirm few things here :
-->Was this working earlier or never worked from the time you got VSAN adapter configured
-->Version of VSAN management pack version ?
-->Version of your vcenter and vsphere
Thanks for reply and apologies for the belated response.
VSAN adapter in general was always working. vROPS was deployed greenfield as 6.7 (no migration, fresh install). vCenter is 6.7.0d, and ESXi is 6.7 but missing August patches.
* Most VSAN metrics for vROPS are fine and always have been. Just a few that are suspect and always have been.
* We had a recent incident where VSAN drives were added and one drive was bad. The I/O degradation from the one bad drive brought the entire VSAN volume offlie.
* We have had other incidents of bad drives which were not alarmed nor mitigated by VSAN/vROPS
We are stable now, but looking at metrics in vROPS, these are the graphs that seem suspect:
15 and 18 are always 0. Array is all-flash.
21) This shows 100% is always free. We run close to 80% capacity.
23/24) Always no errors on disk group when we know otherwise due to bad disks. Same with packet drops -- we can see a non-zero value on the VSAN dedicated switch ports.
22) Disks are not closed to being balanced. If I hoover over any block it shows disk utilization as zero -- which is 100% false.
26) This shows zero values 100% of the time -- include when we have bad drives (and outages from bad drives). Same for cache tier.
The vast majority of VSAN metrics are working fine. Disk latency, write buffers and many more all work fine. But these noted above do not seem to have valid metrics and has always been like since since vROPS was deployed new as 6.7.
Did you turned on vSAN Performance Service?
Ever get this fixed?