we have a strange Problem - After Recompose a few Clients lose their network connectivity (Red X at the Windows Explorer) but there is no Connectivity Issue at all - no errors in VMware.log from clients or connection servers. If you reboot the client everything is ok and there is no error anymore. We traced with wireshark on all destinations, Client, Server,... but nothing to find.
we also changed our core switch, our networking from 1G to 10G Speed, the Gateway and a lot more but nothing work.
Thats why I want to check this error NETLOGON 5722 in the time where we Recompose is going on. Some Clients have this NETLOGON Error 5722 and some not, but the Image is the same.
We take QUICKPREP for the Recompose, our Domain is a 2008R2 Active Dir. Domain.
We don't know how we can find this errors, from 150 Clients, 1 or 2 have this error - if they reboot the machine everything is fine, but this is not normal...
thx for your help,
I suspect the ports on the vSwitch or vDS whichever you are using. Before rebooting, go find the port they are connected to and refresh it. See if the connection is restored.
I would also set static binding on the portgroup being used by the VDI desktops.
Sorry I was 1 week on holiday thats why my answer is so late.
By the next error I will test to Refresh the Ports.
About the Settings of our VSphere Distibuted Switch
Enabled static port binding is active on the VLAN of the Clients and the Uplink of the VSphere Distributed Switch
thx a lot
How big are your dhcp pools and are they filling up, I'd make sure the least time is as short as technically possible. Also make sure you have all the recommeneded patches installed, if your still using windows 7 desktops I've seen this happen when missing some vmware related patches. I'd make sure you have the ones listed from here
its not an official list but I think all the ones listed there are necessary. I saw the exact issue with windows 7 desktops and I think it was because I didn't have VMXNET 3 patch thats referenced installed.
DHCP LEASE Time is set to 1 Hour.
DHCP Scope was not full but I see that there are not many IP Adresses anymore free. Thats strange because we have only eg. 120 Virtuell Desktops in this Scope and their should be 100 IPs Free (DHCP IP Reservations 1-30 and Scope from .30 - .254). I have added a LeaseExtension DWORD Key with 4 Hours to shorten the Grace Period..
Thx for your post to the windows Updates
We didn't install the recommended hotfix for fixing VMXNET 3125574 Convenience Rollup - I installed it yesterday but the Error still occur 😞
Best Regards Michael
Thx for your answer.
Yesterday i had the error where one client lose all their network connections in explorer.
I tried to refresh the vDS portgroup but it didn't help
Another possibility I've seen is this
if windows is updating the time around the customization time it may go past the dhcp expiration date it screws with the customization process. If the clone can't communicate it restarts the vm at least once and tries again.
Could you confirm if the affected VMs had a 169 IP before rebooting? With you mentioning that not as many IPs are available as expected. The likely cause is the recomposing where desktops with new MAC addresses get a lease. While the old MAC addresses hold on to existing leases until expiration.
Even if DHCP is not the issue in this case. I would cut the lease time to 10 minutes. I have been doing VDI for many years and that will save you a lot of headache in recomposes. There is essentially zero risk.
No the VM‘s which lost their network connection in windows explorer have no 169. Ip. E.G. The User works and want to open a file or save one, then it tells him the network connection is not avaiable.
You can Ping all the servers which are marked as disconnected in windows explorer (Only Vmware View Clients have this Problems), there is no Error Message in Vmware.log of the VM or in the windows events.
No Receiving errors on the VDS Switch and so on. I talked to some manufacture of softwares which are also infected and we traced the error – The Software Manufactor said that their is a KeepAlvie Timeout and the server closed the connection
To the Client. Thats why we changed nearly everything New Core Switch / 1G to 10G (to avoid cabeling errors) / New LoadBalancer for View (Citrix Netscaler) / Wireshark on Esxi Host / Switch and so on…. / a lot of money
I have two thoughts:
Time Error (to avoid time error we changed the Time Source to One physical Machine, but that doesn’t help) or DHCP Error
DHCP: I have changed the lease time to 10 Min – i will report you
Yesterday I investigated time for the DHCP Adresses error where we have not much Adresses free.
I made a DHCP Cleanup of the VDI Scope >>> after that i saw some strange Ghost leases with MACS like: 31302e302e39302e31353200 (Hex Code) and the same Lease Expire Time.
I deleted them and until now they came not back (after google research this error is known, i will wireshark the dhcp server for UDP 67/68 and look which device sends this shit…,)
Here a picture:
One Question about Recompose: Do you use „Use Existing Active Directory Accounts“ --- We have Pools with it marked and without (but the errror is on both )
Best Regard Michael
- We changed Lease Time to 10 Min --> This Fixed the low IP Adresse Error on the DHCP Server
- The GHOST Macs are also gone
The Bad is, that stil some Vm's lose their Network Connection in Explorer (Pinging the Target Server is good, and no Error Message and so on)
I have found another Strange Thing which could be the Source of the Error:
If you look in the System Events of the VM which has lost their Network Connection, you can see 6x Kernel-General Events (Time Sync with PDC).
Why do some VM's so often Sync their Time? In a normal Session their is only 1 Kernel-General Event.
Here are our Time Recources for the VDI Esxi Host:
About 2 Months ago we changed it to only ONE Physical Time Source, our physical PDC Server (before changing this to only one, the error was still their! )
The Sync Between Guest- and Host (Vmware Tools) is disabled
What can I do to avoid this Kernel-General Messages?
Thx Best Regards Michael
Good to hear that DHCP is back in order.
I'm sure why the NTP lookups are happening rapidly. Could be a symptom.
Back to your issue, the servers are pingable while windows explorer does not work. Are they pingable by DNS and IP? That part is critical, because if they are this is likely local to the OS vs being a VMware issue.
did you verify all the hosts have ntp started and is running. Virtual machines when they are created get the time from the host they are one, if one host is off the vms may have to do multiple updates to get the time in sync.
Next Update: Yesterday there wasn't a Client Network Drive Disconnect Error (I thought that I have fixed ist) but Today there are 2 Clients with the Error.
Changes: I have lowered the Weight at the SRV Records from 100 to 25 of the Active Directory Jobs (ldap, kerberos,..) of the physical Domain Controller which is the PDC of the domain, to minimize the DNS / Active Directory Storm of the Clients. I wanted to check if the PDC is overloaded by Clients, and at this day we had no Client disconnect error. At the Evening I made a Recompose Job on all Pools, and today I had 2 Clients Disconnect Errors Again, they had 3 Kernel-General Events in the System-Log.
(https://technet.microsoft.com/en-us/library/cc961719.aspx) >>> SRV Records Weight.
Are they pingable by DNS and IP? >>>> Yes you can ping everything in the domain, IP and DNS, From Client to Server or Server to Client.
did you verify all the hosts have ntp started and is running: >>>> Yes on all ESX Host the Service is running and configured (We changed it to only 1 Physical to have only one Time Server, but this did'nt work)
best regards michael