Hi all,
I have a Large 6.6.1 cluster. We have 5 Master/Data nodes and 4 Remote Collectors. There are 4 vCenter adapters and 1 vSAN adapter.
I switched from the MPSD to the vSAN adapter this past Monday. I have removed the old MPSD adapter and have confirmed the objects it monitored have been removed as well.
Since switching over to the vSAN adapter, I have noticed that pretty much all of the objects in that instance have a collection status of "Old Data Receiving". Some objects are collecting, some have old data. I know the "Old Data Receiving" means data is not current and it is behind by 5 polling cycles.
My questions:
- Does this status simply mean there is no new data?
- Should I be concerned with not receiving actual data to trigger the alerts?
Thanks for any and all help!
Dumb question but the VSAN performance service is turned on for the VSAN clusters?
From the 6.5 docs:
When you create a vSAN cluster, the performance service is disabled. Turn on vSAN performance service to monitor the performance of vSAN clusters, hosts, disks, and VMs.
When you turn on the performance service, vSAN places a Stats database object in the datastore to collect statistical data. The Stats database is a namespace object in the cluster's vSAN datastore.
Sorry for the late response, I wanted to get some information from VMware regarding this.
I checked that the health/performance service was on, and it is. I have spoken to VMware about this and it seems there are a bunch of customers who are having the same issues. Vmware has stated it is a bug in the code, where the metrics are not being collected in a timely manner and is timing out. This is why the metrics are not up-to-date.
I have installed the hotfix they provided for the vSAN adapter, but it seems to be the same issue. Most, if not all, customers are still seeing the same issue, but some metrics are coming in correctly, where most are still behind in the collection.
Engineering is looking into this more and there is supposed to be another Hotfix that will be released, no date on this as of yet.
I will update as I get back more information.
I have the same issue at my large customer, also have the new hot fix applied and the same issue.
Back to engineering to tweak the call to the vSAN API
I wanted to come back and revisit this as we might have this resolved, for now. I installed the vSAN hotfix (2.0.0.7192536) for our Large environments. Actually, all of our vROPs clusters are Large environments, but we do have some vSAN adapters with 200-900 objects.
Along with the hotfix above, I also put in some workarounds that were suggested by VMware engineering: We made changes to the below file
File: /usr/lib/vmware-vcops/user/plugins/inbound/VirtualAndPhysicalSANAdapter3/conf/config.properties
Original File:
# Frequency the resource collection should take place
RESOURCE_DISCOVERY_FREQUENCY = 5
# vCenter resources cache update frequency
VCENTER_RESOURCE_DISCOVERY_FREQUENCY = 5
# Pool sizes for discovery & collection
DISCOVERY_POOL_SIZE = 3
COLLECTION_POOL_SIZE = 5
# VIM Client read timeout (ms)
VIMCLIENT_READ_TIMEOUT = 120000
Edited File:
# Frequency the resource collection should take place
RESOURCE_DISCOVERY_FREQUENCY = 5
# vCenter resources cache update frequency
VCENTER_RESOURCE_DISCOVERY_FREQUENCY = 5
# Pool sizes for discovery & collection
DISCOVERY_POOL_SIZE = 3
COLLECTION_POOL_SIZE = 5
# VIM Client read timeout (ms)
# VIMCLIENT_READ_TIMEOUT = 120000
# VMWARE WORKAROUNDS TO VSAN BUG
CIM_SERVICE_PROTOCOL = none
VIMCLIENT_READ_TIMEOUT = 900000
I have this now installed in 2 cluster which were seeing the "Old Data Receiving" issue. I can now say that we seem to be receiving metrics, although, some of the vSAN metrics seem to be 20 mins behind, which I would expect since we increased the READ_TIMEOUT. I will wait for the weekend to pull a full log bundle to confirm the cluster is not seeing the errors pertaining to the vSAN bug.
I hope everyone else who was affected has this resolved now. I am looking forward to the new vROps release with fixes.
Just a note:
The above workarounds actually disable the SMART metrics for the vSAN adapters. I have checked here: Metrics for vSAN Cache Disk
It looks like disabling the SMART metrics might not be in the best interest, might need to revisit this.
Has anyone else had any other issues after applying the HF or any workarounds?
Thanks!