VMware Cloud Community
compwizpro
Enthusiast
Enthusiast

vsphere loses connection to esx host management during linked-clone provisioning

Hi.  I have a vsphere cluster setup which consists of 2 hosts running 5.5u3 and vcenter 6.0u1.  Each host has 10GB and 1GB connectivity.  I have deployed a horizon 6.2 test lab in that cluster to test linked-clone provisioning.  I am also using NFS storage off Isilon over 10GBe on the same subnet with hardware acceleration enabled.

I can successfully configure my horizon environment with VMware composer set up to provision linked clones.  I built a windows 7 test box with the view agent and prepared it according to the VMware lab deployment guide for horizon 6.2.  I can successfully create a floating linked-clone desktop pool and initially set it to provision only 2 desktops which successfully provision and I can successfully connect to using the horizon client.

When I attempt to provision more machines such as 10, I can see the tasks in vsphere client get kicked of such as the cloning the replicas, etc.  I then see some or all the progress bars hang at the same percentage for a few minutes and then one both or one of the hosts goes grey in a disconnected state and all the VMs running on those hosts go grey in a disconnected state.  I still have network connectivity to the VMs but I can't manage them in vSphere client and all the provisioning tasks that follow time out and error out.  Sometimes the hosts come back on their own or I have to manually restart the management agent which sometimes works.  I can still ping the host and connect to them directly with the vsphere client but vcenter cannot communicate with them.

The hosts management interface IPs are on the same subnet as the vcenter server.  I have tried multiple network configurations. I tried the vsphere distributed switch version 5.5 with both static and ephermeral port binding modes for the port group used for the linked-clones.  I have also tried standard switch for all network interfaces and deleted the distributed switch with the same result.

I have this same configuration set up in my home lab which is running esx 6.0 with vsphere 6.0 and distributed switch 6.0 with static port binding and I was able to provision 20 desktops with no issue.

Is there a network configuration issue I am missing or a bug in the 5.5 distributed switch that can cause this?

Let me know if you need more information and thanks for the help!!  Also direct me if there is a better topic i can post this under let me know.

Tags (2)
0 Kudos
2 Replies
joshopper
Hot Shot
Hot Shot

Have you tried physically or logically separating your management traffic?

0 Kudos
compwizpro
Enthusiast
Enthusiast

Thanks for the response! 

Originally I had my management traffic on the same NIC with everything else but ended up moving it to its own vSwitch with it's own physical NIC with same issues.

Is there a log that would record these errors that I could see if I was getting network congestion or if packets are dropping, etc.?  I highly doubt I am getting network congestion over 10Gbe but I can't be certain.

0 Kudos