We have vCloud Director running many organisations and many VMs without a problem but every once in a while (after a reboot) a VM will lose it's network connection, vCloud shows the nic as connected but looking at the vDS (and properties of the VM) it shows as disconnected/link down
Sometimes we can fix this by vMotioning the VM onto another host but today we have one that just will not connect, we have tried:
I've seen this KB article but it has not resolved the problem:
Is anyone able to suggest anything which may help?
We see an issue like this once and a while, and at least in our environment it is always the same thing. The DVS has used all the ports up on a given host, and just can't make the connection.
We had to change (increase) as setting on each host. "Networking->DVS->switchX->"Properties" -> "Maximum number of ports designated on this host"
The default is 256, we had to increase to 1024 in order to stop running out of ports.
Hope this helps.
Also, if this is an isolated network backed by vCloud Network Isolation ... when the VM boots it tries to enact a vsla-fence module on the ESXi host (part of the vCD Agent). If the agent was having issues, you might have to re-install the agent to get it up and running correctly. This can be an easy test if you disable 1 host and redeploy VMs ... then reprepare .. then move a VM onto it and try to connect it. if that works, repeat for all hosts.
Thanks both for your responses, i've tried both suggestions and it hasn't made a difference (the hosts were already set to 1024 ports). i've unprepared / re-prepared every host in the cluster in case
interestingly i do get an error when i try and start monitoring the port state on the portgroup - and only on the portgroup which contains the problem VM
Cannot complete a vSphere Distributed Switch operation for one or more host members
vDS operation failed on host xxx, got (vim.fault.VimFault) exception
I have googled the error but only found suggestions around Nexus 1000v which is not used in this environment.
In my case i did it manually on the vm settings because the problem occur sometimes only on one master image! Seems that the DVS lost the information of related virtual machine nic, in my case I’ve notice that this happen after doing some snapshot activity (go to or delete). I don’t know if is possible to script this kind of operation.
Let me know if the workaround solved your problem!