lilwashu
Contributor
Contributor

Windows NETLOGON 5719 at Startup

I am having what appears to be either a networking or guest startup issue with Windows 2008 R2 SP1 guest machines on my VSphere Essentials setup. Configuration is:

HP BL460C G7 servers, 72GB RAM, with built in Emulex dual port 10Gb NICs and mezzanine NC632M dual port NICs (total 4 per host).

NICs connected to HP C3000 blade enclosure and GBe2C HP (Nortel) interconnect switches which are then uplinked to HP 5304 modular switch

ESXi 4.1 Update 1 with Emulex driver update

The issue I am seeing is that when Windows 2008 R2 boots (this can be on a new install or a P2Ved install) it throws a NETLOGON 5719 (unable to establish a secure connection) error in the event log followed closely by a windows time lookup failure warning. I can log in OK and the errors do not reoccur, Group Policy applies OK and the time service syncs a couple of seconds after the initial warning.

This only happens if the NIC is set to a static IP address. If I set it to DHCP (same address details as the static one), I do not get any errors at all.

What appears to be happening is that NETLOGON is starting before the network has completely initialised. I have tried making it depend on another service etc, disabled portfast/STP on the switches and have seen no change. I have also read an MS article which says it can be ignored, however I don't like random errors and I have not seen this before in similar deployments with similar hardware. We don't have any issues on physical servers running the same OS, even if they are in the same blade enclosure, or on Windows 2003 VMs.

Has anyone else noticed this behaviour?

0 Kudos
24 Replies
vmroyale
Immortal
Immortal

Hello and welcome to the forums.

Note: This discussion was moved from the VMware ESXi 4 community to the Virtual Machine & Guest OS community.

Do you have the VMware Tools installed in these guests, and what NIC are you using in them?

Good Luck!

Brian Atkinson | vExpert | VMTN Moderator | Author of "VCP5-DCV VMware Certified Professional-Data Center Virtualization on vSphere 5.5 Study Guide: VCP-550" | @vmroyale | http://vmroyale.com
0 Kudos
lilwashu
Contributor
Contributor

I do have VMWare tools installed and have tried the E1000 and VMXNET3 adapters without any change in behaviour.

0 Kudos
vMikee386
Contributor
Contributor

Has there been any movement on this?  I am seeing a similar issue with W2K8-R2.

-M

0 Kudos
jrkennemer
Contributor
Contributor

Any movement?  We're seeing the same thing on our new environment (Cisco UCS and EMC VNX) and after a couple of weeks working with VMware, Cisco, Microsoft and our integrator, we're being told our system is "too fast".  Essentially since we're in test mode and only have a few VMs running on the environment, there's not enough load and Windows boots too quickly.  This causes our Exchange services to fail, which is impeding progress on our pilot.

Has anyone else run in to this situation?

0 Kudos
lilwashu
Contributor
Contributor

No progress yet and VMWare support have been worse than useless so far. I will be calling them tomorrow to register a complaint about this particular aspect.

In the meantime we have set Exchange services to delayed start which has worked around it so far.

If anyone wants my case reference to give to VMWare support PM me and hopefully we will get somewhere.

0 Kudos
mila30
Contributor
Contributor

Hi, hope everything works fine with you now.  I would like to know your developments.  Because your case once solve will benefit us in a way that we will know what to do if it will happen to us too.  I am looking forward to read updates on your status http://imagicon.info/cat/5-59/1.gif

0 Kudos
cdsmm
Contributor
Contributor

We are having the exact same symptoms: NETLOGON 5719 and some time-service errors on boot.

The one different thing is we are running this on physicals. But only on new dual 6 core servers, our older servers don't have these issues. We are running 2008 R2 Enterprise SP1. Becuase we have physicals this problem only manifests itself when we have LACP enabled. We have dozens and dozens of other servers with the same setup (2008 R2, cisco switches) but a couple years old in terms of hardware and they don't have the problems.

We've followed all other suggestions found on the internet but nothing has worked. We've also been trying to have the netlogon service depend on other services, and delay start, etc. But nothing has helped.

The impact for us and these NETLOGON errors is that some of our computer GPO's are not being consistently applied. So we cannot ignore these errors.

I have found some forum posts on an HP and a separate IBM site (we have Dell servers) and everyone seems to be having the exact same issue:

http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14594253&tstart=0

http://h30499.www3.hp.com/t5/ProLiant-Servers-Netservers/DL360-G7-NC382i-Event-5719-every-boot/td-p/...

We've opened a call with Dell, and I will be opening a call with MS here shortly, but based on what I am seeing, the NETLOGON service seems to be starting so fast that the network connections aren't fully available yet.

0 Kudos
vMikee386
Contributor
Contributor

I truly believe that this is something MS needs to address. They “own” the services in question and I cannot understand why they would attempt to use one before it has started.

0 Kudos
lilwashu
Contributor
Contributor

The problem does not seem to be trying to use a service before it has started, rather that the service starts before the network hardware has initialised fully, which would put the problem in the hardware vendor's court as far as I can see.

0 Kudos
vMikee386
Contributor
Contributor

Unless it is a MS (or MS certified) driver. The driver should tell the system when the HW is ready for use. The service should wait until the driver gives the OK. If the driver does not start within an allotted time, then there should be a hard error.

0 Kudos
ANorton
Enthusiast
Enthusiast

Has there been any update on this anywhere? We are having the same issue.

0 Kudos
vMikee386
Contributor
Contributor

No change that I am aware of. I installed the UCS and VNX data center named above and the issue went away once we placed a workload on the system.  Sounds a bit silly, but place a workload (even if it is synthetic via something like IOmeter) and see if your issue persists.

0 Kudos
lilwashu
Contributor
Contributor

We never got around the problem, and also don't see the problem on "slower" storage such as NFS/ISCSI. VMware blamed Microsoft and Microsoft blamed VMware.

Interestingly we don't see the problem with HyperV/Xenserver hypervisors on the same platform.

0 Kudos
DuncanClay
Contributor
Contributor

Same problem, Windows Server 2008 R2 SP1 Standard, vCenter 4.1.0, VMware tools 8.3.2, VMXNET 3 adapter, DHCP.  Spent the best part of a day trying various Microsoft KB solutions.

Our problem is that SQL Server 2008 R2 was following the NETLOGON Event ID 5719 with an Event ID 7000 (The SQL Server (MSSQLSERVER) service failed to start due to the following error: The service did not start due to a logon failure.) and 7038 (The MSSQLSERVER service was unable to log on as domain\user with the currently configured password due to the following error: There are currently no logon servers available to service the logon request.).

Whilst I have not been able to stop the 5719 error, setting the startup type of MSSQLSERVER to Automatic (Delayed Start) at least means SQL Server is starting automatically after a reboot.  I consider this a workaround at best, but if you have automatic services that are failing, it might be worth a try.

0 Kudos
goppi
Enthusiast
Enthusiast

Same problem here with Win2008R2 SP1 on VSpe 5.1 and no solution so far.

0 Kudos
PPH01
Contributor
Contributor

Same issue here with Windows 2008 R2 SP1 on ESXi 5.1.  Underlying storage didn't seem to matter any as I tested using the local disks (SAS) and the SAN (Nimble array).  Both presented the issue.  Tried with VM tools installed and without with same issue.

Works fine with DHCP and breaks with Static IP address.

Tried a number of different MS fixes, none made any difference.

http://support.microsoft.com/kb/938449 - none of the options fixed issue.

STP is off on our HP Procurve switches.

We're also running XenServer 5.6SP2 (migrating off XenServer to VMWare), no issue in our XenServer environment or on our physical machines which points to VMWare.  Both XenServer and VMWare are running on the same Procurve switches.  All VM traffic is on dedicated ports (not shared with management traffic).

Found that using DHCP Reservations was a workaround, doesn't help for DC's though.

How are others getting around this issue on DC's?

0 Kudos
ANorton
Enthusiast
Enthusiast

Try this

1. Click Start, type regedit in the Start Search box, and then press ENTER.

2. Locate the following registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

3. On the Edit menu, point to New, and then click DWORD Value.

4. Type ArpRetryCount 

5. Right-click the ArpRetryCount registry entry, and then click Modify 

6. In the Value data box, type 0 and then click OK.

7. Exit Registry Editor.

8. Restart the machine.

0 Kudos
PPH01
Contributor
Contributor

Worked great on two servers now (to include a new Exchange 2010 box)!  Thank you!

I'm not sure I fully understand what this does.

http://technet.microsoft.com/en-us/library/cc957526.aspx

http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2networking/thread/d7bda315-6366-...

As discussed over phone earlier , for conflict detection, the client computer uses the Address Resolution Protocol (ARP) request to determine whether the IP address is being used. However, a ProxyArp device might incorrectly answer the ARP request, and an IP address conflict is reported.

When this problem occurs, the ProxyArp device responds to all ARP requests.
To work around this problem, we can turn off gratuitous ARP by setting the value of the ARPRetryCount registry entry to 0. To do this, follow these steps.

Is this indicating a deeper issue in our network possibly?

The DC's all appear fine (DCDiag, etc), none of the systems outside of VMWare seem to be having issues.

Thanks again!

0 Kudos
goppi
Enthusiast
Enthusiast

Is this indicating a deeper issue in our network possibly?

The DC's all appear fine (DCDiag, etc), none of the systems outside of VMWare seem to be having issues.

Thanks again!

From the description what the setting does indeed it is an indication of a general network issue.

It seems that a network device in your netwok is sending incorrect ARP packets.

The feature you turned of is to prevent having duplicate IP addresses in your network.

0 Kudos