Re: Old Data Receiving

wbabineaux · ‎11-16-2017

Hi all,

I have a Large 6.6.1 cluster. We have 5 Master/Data nodes and 4 Remote Collectors. There are 4 vCenter adapters and 1 vSAN adapter.

I switched from the MPSD to the vSAN adapter this past Monday. I have removed the old MPSD adapter and have confirmed the objects it monitored have been removed as well.

Since switching over to the vSAN adapter, I have noticed that pretty much all of the objects in that instance have a collection status of "Old Data Receiving". Some objects are collecting, some have old data. I know the "Old Data Receiving" means data is not current and it is behind by 5 polling cycles.

My questions:

- Does this status simply mean there is no new data?

- Should I be concerned with not receiving actual data to trigger the alerts?

Thanks for any and all help!

jasnyder · ‎11-16-2017

Dumb question but the VSAN performance service is turned on for the VSAN clusters?

From the 6.5 docs:

When you create a vSAN cluster, the performance service is disabled. Turn on vSAN performance service to monitor the performance of vSAN clusters, hosts, disks, and VMs.

About this task

When you turn on the performance service, vSAN places a Stats database object in the datastore to collect statistical data. The Stats database is a namespace object in the cluster's vSAN datastore.

Prerequisites

All hosts in the vSAN cluster must be running ESXi 6.5 or later.
Before you enable the vSAN performance service, make sure that the cluster is properly configured and has no unresolved health problems.

Procedure

Navigate to the vSAN cluster in the vSphere Web Client navigator.
Click the Configure tab.
Under vSAN, select Health and Performance.
Click Edit to edit the performance service settings.
Select the Turn On vSAN performance service check box.
Select a storage policy for the Stats database object and click OK.

wbabineaux · ‎12-05-2017

Sorry for the late response, I wanted to get some information from VMware regarding this.

I checked that the health/performance service was on, and it is. I have spoken to VMware about this and it seems there are a bunch of customers who are having the same issues. Vmware has stated it is a bug in the code, where the metrics are not being collected in a timely manner and is timing out. This is why the metrics are not up-to-date.

I have installed the hotfix they provided for the vSAN adapter, but it seems to be the same issue. Most, if not all, customers are still seeing the same issue, but some metrics are coming in correctly, where most are still behind in the collection.

Engineering is looking into this more and there is supposed to be another Hotfix that will be released, no date on this as of yet.

I will update as I get back more information.

carvaled · ‎12-05-2017

I have the same issue at my large customer, also have the new hot fix applied and the same issue.

Back to engineering to tweak the call to the vSAN API

wbabineaux · ‎01-26-2018

I wanted to come back and revisit this as we might have this resolved, for now. I installed the vSAN hotfix (2.0.0.7192536) for our Large environments. Actually, all of our vROPs clusters are Large environments, but we do have some vSAN adapters with 200-900 objects.

Along with the hotfix above, I also put in some workarounds that were suggested by VMware engineering: We made changes to the below file

File: /usr/lib/vmware-vcops/user/plugins/inbound/VirtualAndPhysicalSANAdapter3/conf/config.properties

Original File:

# Frequency the resource collection should take place

RESOURCE_DISCOVERY_FREQUENCY = 5

# vCenter resources cache update frequency

VCENTER_RESOURCE_DISCOVERY_FREQUENCY = 5

# Pool sizes for discovery & collection

DISCOVERY_POOL_SIZE = 3

COLLECTION_POOL_SIZE = 5

# VIM Client read timeout (ms)

VIMCLIENT_READ_TIMEOUT = 120000

Edited File:

# Frequency the resource collection should take place

RESOURCE_DISCOVERY_FREQUENCY = 5

# vCenter resources cache update frequency

VCENTER_RESOURCE_DISCOVERY_FREQUENCY = 5

# Pool sizes for discovery & collection

DISCOVERY_POOL_SIZE = 3

COLLECTION_POOL_SIZE = 5

# VIM Client read timeout (ms)

# VIMCLIENT_READ_TIMEOUT = 120000

# VMWARE WORKAROUNDS TO VSAN BUG

CIM_SERVICE_PROTOCOL = none

VIMCLIENT_READ_TIMEOUT = 900000

I have this now installed in 2 cluster which were seeing the "Old Data Receiving" issue. I can now say that we seem to be receiving metrics, although, some of the vSAN metrics seem to be 20 mins behind, which I would expect since we increased the READ_TIMEOUT. I will wait for the weekend to pull a full log bundle to confirm the cluster is not seeing the errors pertaining to the vSAN bug.

I hope everyone else who was affected has this resolved now. I am looking forward to the new vROps release with fixes.

wbabineaux · ‎03-06-2018

Just a note:

The above workarounds actually disable the SMART metrics for the vSAN adapters. I have checked here: Metrics for vSAN Cache Disk

It looks like disabling the SMART metrics might not be in the best interest, might need to revisit this.

Has anyone else had any other issues after applying the HF or any workarounds?

Thanks!

All

Old Data Receiving

About this task

Prerequisites

Procedure