VMware Cloud Community
gefleborg
Contributor
Contributor

After 6.7 U3 vCheck hangs/is slow in step "01 General Information" due to function "Get-VIEventPlus" is much slower

I asked this last week in GitHub without any reaction. Let's try here......
I upgraded two of my four vSphere clusters from 6.7.0 11727113 to 6.7.0 14368073. I also upgraded two test clusters.
The test clusters now executes vCheck in 15-20 minutes instead of 0-1 minute.

The production cluster with vCenter Essential license (3 ESXi) still works to run vCheck but the schedule now takes +10 minutes instead of 1 minute.
The production cluster with vCenter Standard license contains 32 ESXi and before upgrade of ESXi the report was executing around 15-20 minutes before done.
After that the time increased with the number of upgraded ESXi. When all 32 ESXi was upgraded (after 7 days) the report is taking days to finnish.

The step with most issue is "General Information" that in the last completed report executed in 2687 minutes of report completed time for 2843 Minutes. (I disabled that step and from this weekend the execution time have been 25-30 minutes.

I have found that function "Get-VIEventPlus" is terrible slow after my upgrade. That function seems to be called on several times in vCheck (but the first from "General Information" is most bad).

Any ideas on what to do or how to troubleshoot or debug? My last efford was to upgrade powercli so it's version is 11.4.0.14413515 without differens.

/Peter

0 Kudos
7 Replies
LucD
Leadership
Leadership

The Get-VIEventPlus function, of which I'm the author, uses the EventHistoryCollector to retrieve events from the VCSA.

At first sight, there is no link with the PowerCLI version, since the function is calling the API method directly.

Since I currently have no knowledge of any changes in the EventHistoryCollector between the 2 versions you mention, I will have to do some testing.

When you say "I upgraded two of my four vSphere clusters", do you mean only the ESXi nodes in those clusters?

Did you also upgrade your VCSA? Which version is that VCSA?
The events are retrieved from the VCSA DB, so there shouldn't be any impact from the ESXi node version afaik.

A quick change you could try, currently, the Get-VIEventPlus function uses a window of 100 events ($eventnumber).

You could try increasing that number (maximum possible is 1000) to see if that makes any change.


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
gefleborg
Contributor
Contributor

Hi Luc!

VCSA from 6.7.21000 to 6.7.40000 via GUI from internet.

ESXi from 6.7.0, 11675023 to 6.7.0, 14320388 via update manager. Also upgrading HPE hw to new FW level.

I have (per cluster) upgraded VCSA before any ESXi upgrade. Of cource I should have noticed this delay in my test.......

It's my production cluster for server that have the vorse scenario. For this cluster the execution time gets vorse the more om my 32 ESXi that is upgraded. If its intresting I can extract numbers and times from the daily reports.

The schedule is running on a windows 2016 server with separate directories for vCheck per cluster.

I have the the "get-vieventplus" in an older version on my Windows 10 PC (I think UTC is missing in that one) and my imagination is that it runs slower than earlier.

Yesterday I changed to 1000 but I interupted the report since it was slow.

Today I made a copy of the orignial directory on the same server and disabled every plugin except 00 and 01.

I started it entering ".\vCheck.ps1 -Outputpath d:\" in powershell_ise and it was interruped with an error. I tested once more with the same error. The output timestamp says 

[09:46:21] ..start calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 2]

[09:51:21] ..finished calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 2]

with an error between

and the second time

[10:08:29] ..start calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 2]

[10:13:30] ..finished calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 2]

with the same errors between. I defently match the timeout mentioned in the error message.

Third time I set a breakpoint in powershell_ise at the row 336 (that is mention in the error) and when it breaked I just pressed F5 to continue. This was +90 minutes ago so no time out this time. I'll let it run to see what happends.

[10:23:10] ..start calculating Connection settings for vCenter by Alan Renouf v1.20 [1 of 2]

[10:23:34] ..finished calculating Connection settings for vCenter by Alan Renouf v1.20 [1 of 2]

[10:23:34] ..start calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 2]

Hit Line breakpoint on 'D:\v100a011-vCheckXX\Plugins\00 Initialize\00 Connection Plugin for vCenter.ps1:336'

[DBG]: PS D:\v100a011-vCheckXX>>

Here is the error from earlier:

Exception calling "CreateCollectorForEvents" with "1" argument(s): "The request channel timed out while waiting for a reply after 00:05:00. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding.

The time allotted to this operation may have been a portion of a longer timeout."

At D:\v100a011-vCheckXX\Plugins\00 Initialize\00 Connection Plugin for vCenter.ps1:336 char:10

/Peter

0 Kudos
LucD
Leadership
Leadership

That last error is the infamous 300 seconds timeout for web operations.

You can increase this or disable it, with the Set-PowerCLIConfiguration cmdlet and the WebOperationTimeoutSeconds parameter.

But that will just avoid the error, not fix the longer runtime.

I'm doing some tests if I can replicate those longer runtimes for collecting events.
And checking how I could eventually improve the runtime duration.

On a side note, is the Get-VIEvent cmdlet also showing longer runtime?

If possible can you compare a run of both, Get-VIEvent and Get-VIEventPlus, for a reasonably long query?
Wondering if a fix that was recently introduced, might be causing this.
That fix was on the VCSA and for the "Error in deserializing body of reply message for operation 'RetrieveProperties'" error.


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
gefleborg
Contributor
Contributor

This is way before HTML formatting so what is a web operation doing when just plugin 00 and 01 is executing? Also, why not every execution?

I have try a little with Get-VIEvent compared to Get-VIEventPlus and whitout having any reference to before upgrade I should say the are both slower now.

It might be that you are on to something about Get-VIEvent since we one other day feldt that going to monitor->events sometime was slower in htlm5 gui.

As a comment the execution I started earlier is still running in powershell_ise. I cannot see that the VCSA is suffering, it's still managable as far as I see. It might be a little bit slower after upgrade but that's a subjective note. The Powershell_ise session is using every bit of ram in the windows server (5GB out of 6GB) so I will cancel it since that server have other work to handle. It started the execution 10.23 and now time is 15.37. I had to terminate it since it was not responding. Maybe the heavy memory use is the issue?

I Started a new powershell_ise and executed vCheck on my little essensial cluster with 3 ESXi and only 53 VMs. Both VCSA and ESXi is on the "trouble" version'.

I saw in Task Manager of windows server that memory usage was almost 3 GB at most. The "trouble" step executed in

[15:12:07] ..start calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 103]

[15:19:23] ..finished calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 103]

The execution of the full vCheck was before upgrade never longer than one minute before upgrade (since June this year). This time 9 minutes. Last schedules it was 9-10

After that I executed vCheck on a not upgraded VXRail cluster with 16 ESXi and 400 VMs.

In task manager the memory for ISE was newer more that 1,5 GB during this step that executed in

[15:21:48] ..start calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 113]

[15:27:06] ..finished calculating General Information by Alan Renouf, Frederic Martin v1.3 [2 of 113]

Total report time is 12 minute this time, during schedule (at night) 9-12 minutes is normal.

Good luck with reproducing thisSmiley Happy

/Peter

0 Kudos
LucD
Leadership
Leadership

All calls to vSphere API with SOAP are considered web operations.

This is not related to the creation of the HTML file.


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
gefleborg
Contributor
Contributor

Hi Luc!

I just wonder if you got any time so recreate this?

I'm positive now that everything with event (gui and powercli) is struggling more now.

With step "General Information" disabled it now runs in 1 to 2 hours.

Next week I'll be at VMWorld (power cli deep listening to you is booked) and after that week the planning starts to upgrade two more clusters where this vCheck is running (i takes 10 minutes now).

/Peter

0 Kudos
LucD
Leadership
Leadership

This is a known issue I'm afraid.

A bit of history to explain.

In vSphere 6.5 the way the VCSA used events was somewhat changed.
As a result older PowerCLI versions got the infamous error "Error in deserializing body of reply message for operation 'RetrieveProperties'."

This issue was fixed in PowerCLI 11.x, but as a result, the time to get the events dramatically increased.

To show the PowerCLI Team that this an issue I would suggest to open a SR or at least mention this to your TAM.

That way the PowerCLI Team will know that many users suffer from this.

See you in Barcelona Smiley Happy


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos