Highlighted
Contributor
Contributor

Old Data Receiving

Hi all,

I have a Large 6.6.1 cluster. We have 5 Master/Data nodes and 4 Remote Collectors. There are 4 vCenter adapters and 1 vSAN adapter.

I switched from the MPSD to the vSAN adapter this past Monday. I have removed the old MPSD adapter and have confirmed the objects it monitored have been removed as well.

Since switching over to the vSAN adapter, I have noticed that pretty much all of the objects in that instance have a collection status of "Old Data Receiving". Some objects are collecting, some have old data. I know the "Old Data Receiving" means data is not current and it is behind by 5 polling cycles.

pastedImage_0.png

My questions:

- Does this status simply mean there is no new data?

- Should I be concerned with not receiving actual data to trigger the alerts?

Thanks for any and all help!

0 Kudos
5 Replies
Highlighted
Hot Shot
Hot Shot

Dumb question but the VSAN performance service is turned on for the VSAN clusters?

From the 6.5 docs:

When you create a vSAN cluster, the performance service is disabled. Turn on vSAN performance service to monitor the performance of vSAN clusters, hosts, disks, and VMs.

About this task

When you turn on the performance service, vSAN places a Stats database object in the datastore to collect statistical data. The Stats database is a namespace object in the cluster's vSAN datastore.

Prerequisites

  • All hosts in the vSAN cluster must be running ESXi 6.5 or later.
  • Before you enable the vSAN performance service, make sure that the cluster is properly configured and has no unresolved health problems.

Procedure

  1. Navigate to the vSAN cluster in the vSphere Web Client navigator.
  2. Click the Configure tab.
  3. Under vSAN, select Health and Performance.
  4. Click Edit to edit the performance service settings.
  5. Select the Turn On vSAN performance service check box.
  6. Select a storage policy for the Stats database object and click OK.
Justin Snyder ___________________ Blog/Content/Consulting - https://www.ltx.systems Youtube - https://www.youtube.com/channel/UCvaigQrBZx-yfWh-ULiN_ug I love solving a good problem. If you find my effort helpful and time saving, please mark it as the correct answer or helpful.
Highlighted
Contributor
Contributor

Sorry for the late response, I wanted to get some information from VMware regarding this.

I checked that the health/performance service was on, and it is. I have spoken to VMware about this and it seems there are a bunch of customers who are having the same issues. Vmware has stated it is a bug in the code, where the metrics are not being collected in a timely manner and is timing out. This is why the metrics are not up-to-date.

I have installed the hotfix they provided for the vSAN adapter, but it seems to be the same issue. Most, if not all, customers are still seeing the same issue, but some metrics are coming in correctly, where most are still behind in the collection.

Engineering is looking into this more and there is supposed to be another Hotfix that will be released, no date on this as of yet.

I will update as I get back more information.

0 Kudos
Highlighted
Enthusiast
Enthusiast

I have the same issue at my large customer, also have the new hot fix applied and the same issue.

Back to engineering to tweak the call to the vSAN APISmiley Happy

0 Kudos
Highlighted
Contributor
Contributor

I wanted to come back and revisit this as we might have this resolved, for now. I installed the vSAN hotfix (2.0.0.7192536) for our Large environments. Actually, all of our vROPs clusters are Large environments, but we do have some vSAN adapters with 200-900 objects.

Along with the hotfix above, I also put in some workarounds that were suggested by VMware engineering: We made changes to the below file

File:  /usr/lib/vmware-vcops/user/plugins/inbound/VirtualAndPhysicalSANAdapter3/conf/config.properties

Original File:

# Frequency the resource collection should take place

RESOURCE_DISCOVERY_FREQUENCY = 5

# vCenter resources cache update frequency

VCENTER_RESOURCE_DISCOVERY_FREQUENCY = 5

# Pool sizes for discovery & collection

DISCOVERY_POOL_SIZE = 3

COLLECTION_POOL_SIZE = 5

# VIM Client read timeout (ms)

VIMCLIENT_READ_TIMEOUT = 120000

Edited File:

# Frequency the resource collection should take place

RESOURCE_DISCOVERY_FREQUENCY = 5

# vCenter resources cache update frequency

VCENTER_RESOURCE_DISCOVERY_FREQUENCY = 5

# Pool sizes for discovery & collection

DISCOVERY_POOL_SIZE = 3

COLLECTION_POOL_SIZE = 5

# VIM Client read timeout (ms)

# VIMCLIENT_READ_TIMEOUT = 120000

# VMWARE WORKAROUNDS TO VSAN BUG

CIM_SERVICE_PROTOCOL = none

VIMCLIENT_READ_TIMEOUT = 900000

I have this now installed in 2 cluster which were seeing the "Old Data Receiving" issue. I can now say that we seem to be receiving metrics, although, some of the vSAN metrics seem to be 20 mins behind, which I would expect since we increased the READ_TIMEOUT. I will wait for the weekend to pull a full log bundle to confirm the cluster is not seeing the errors pertaining to the vSAN bug.

I hope everyone else who was affected has this resolved now. I am looking forward to the new vROps release with fixes.

0 Kudos
Highlighted
Contributor
Contributor

Just a note:

The above workarounds actually disable the SMART metrics for the vSAN adapters. I have checked here: Metrics for vSAN Cache Disk

It looks like disabling the SMART metrics might not be in the best interest, might need to revisit this.

Has anyone else had any other issues after applying the HF or any workarounds?

Thanks!

0 Kudos