A connection that was expected to be kept alive wa...

SCC · ‎05-28-2020

Hello everyone!

Not sure if this belongs here or in the vCenter forums, networking or what. But here goes.

We have a single VCSA 6.5 and 4 ESXi 6.5

PowerCLI 11.5

The servers are varients of Dell 7xx, 10G fiber to a Dell Switches and 10G fiber to 2 PS6xxx SAN devices Recently (Feb) one of the switches was replaced.

The problem:

Using PowerCLI I am running a script (for years) that simply loops to to deploy multiple copies of the same template over and over. I have been using this same code for years with various tweaks along the way, none recently. What used to take 10 or fifteen minutes to deploy a basic Windows 10 VM (40G thin 20G data) now takes over an hour. As the script runs, I get the error message from the subject line. The error is displayed and the VM deployment continues (for a long time) and eventually a viable VM is deployed. The error can happen several times during the deployment of one VM.

I have Jumbo frames verified on each of vmks and the switch and the SAN. I have installed Dell MPIO and configured on their best practice papers.

I have pinged each address on the SAN from each vmk interface on each server with vmkping -I vmkx X.X.X.X and the results are consistantly in the 0.1 ms range (lettle above, little below)

Short of this, I have little knowledge of problem solving for something like this. I don't know if it is the network between the ESXi and the SAN or something in vCenter causing this. I found one message about a TLS mismatch and tried the recomended solution to no avail.

If anyone has any ideas or need more info, please let me know.

Don

LucD · ‎05-28-2020

Is it the actual New-VM that takes much longer?

Or any code after that in your script?

You could start by looking at the events generated by the creation of the VM, and watch the timestamps.

If that takes long, I would dive into the vxpd log and watch the individual calls for the creation of the VM in there.

You can extract all the entries with the same opid.

I even have a function for that, see Get Your Tasklog Here!

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

SCC · ‎05-28-2020

Thanks for the quick response LucD

Yes, I am confident it is the New-VM cmdlet that is the cause because I ran that same syntax directly from the prompt and got the same result. The output progress bar seems to proceed very quicklly to abouit 40% and then starts moving at a snails pace until maybe 80% and finishes quickliy.

I will follow you suggestions and get back if I do/don't find anything.

Take care and stay safe.

Don

SCC · ‎05-30-2020

The events on the VM in the web client were no help, just an entry when the deploy started and one when it ended.

I did capture the vpxd.log entries during the deployment, and there were some warnings and errors. Unfortunately it is all pretty much Greek to me as they say. I will run this up the ladder and hope we can find some $$$ to open a case with VMware support.

Thanks.

LucD · ‎05-30-2020

Just a thought,are you running this in an isolated environment?

No Internet access.

Then there might some Certificate Revocation Checks or DSN Root server entries that are slow due to timeouts.

If yes, have a look at PowerCLI very slow to load on a Windows 2012R2 Server for some solutions for those issues.

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

SCC · ‎06-03-2020

Thanks for the idea LucD. The environment is not isolated. Our 4 esxi servers are in a cluster with DRS enabled. I took one of the servers out of the cluster and deployment speed was cut in about half. Still twice time than expected, but more tolerable.

Thanks again for the input.

Don

All

A connection that was expected to be kept alive was closed by the server.