VMware Cloud Community
lilwashu
Contributor
Contributor

Windows NETLOGON 5719 at Startup

I am having what appears to be either a networking or guest startup issue with Windows 2008 R2 SP1 guest machines on my VSphere Essentials setup. Configuration is:

HP BL460C G7 servers, 72GB RAM, with built in Emulex dual port 10Gb NICs and mezzanine NC632M dual port NICs (total 4 per host).

NICs connected to HP C3000 blade enclosure and GBe2C HP (Nortel) interconnect switches which are then uplinked to HP 5304 modular switch

ESXi 4.1 Update 1 with Emulex driver update

The issue I am seeing is that when Windows 2008 R2 boots (this can be on a new install or a P2Ved install) it throws a NETLOGON 5719 (unable to establish a secure connection) error in the event log followed closely by a windows time lookup failure warning. I can log in OK and the errors do not reoccur, Group Policy applies OK and the time service syncs a couple of seconds after the initial warning.

This only happens if the NIC is set to a static IP address. If I set it to DHCP (same address details as the static one), I do not get any errors at all.

What appears to be happening is that NETLOGON is starting before the network has completely initialised. I have tried making it depend on another service etc, disabled portfast/STP on the switches and have seen no change. I have also read an MS article which says it can be ignored, however I don't like random errors and I have not seen this before in similar deployments with similar hardware. We don't have any issues on physical servers running the same OS, even if they are in the same blade enclosure, or on Windows 2003 VMs.

Has anyone else noticed this behaviour?

0 Kudos
24 Replies
ANorton
Enthusiast
Enthusiast

It has to do with the Duplicate Address Detection done by windows. In the data, it appeared that Netlogon was trying to initiate communication before the Duplicate Address Detection process, which sends three ARP Requests and waits for responses for the IP address the server has been assigned, had finished. By changing the Arp retry count it allowed the detection to occur faster before the netlogon would try to start.

Trust me it took a while to figure it out and I worked on this with Microsoft and their OS developers to find it.

0 Kudos
PPH01
Contributor
Contributor

This makes perfect sense now. Thanks you!

I believe this indicates a flaw in Microsoft products boot order in my case (although, it could indicate an issue with ARP from a network device depending on the scenario).

  • netlogon service is not forced to wait for the network to fully initialize.
  • The network will wait for the Duplicate Address Detection to complete before it initializes.

Therefore it is possible for systems that boot too quickly to start the netlogon service prior to the Duplicate Address Detection to complete (thus network hasn't initialized).

So, this entry tells the server not to check for duplicate addresses therefore the network initializes faster and is able to complete before the netlogon service starts.

I believe this doesn't show on our XenServer or physical environment because of the speed difference.  Our VMWare environment is MUCH faster (18 seconds to fully boot Server 2012, SAN is under very little load currently), compared to our XenServer enrivonment which takes ~95 seconds.

A huge help ANorton! Thanks,

0 Kudos
de2rfg
Enthusiast
Enthusiast

ANorton wrote:

Try this

1. Click Start, type regedit in the Start Search box, and then press ENTER.

2. Locate the following registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

3. On the Edit menu, point to New, and then click DWORD Value.

4. Type ArpRetryCount

5. Right-click the ArpRetryCount registry entry, and then click Modify

6. In the Value data box, type 0 and then click OK.

7. Exit Registry Editor.

8. Restart the machine.

Thank you! You saved me a lot of time.

I had the same symptoms in our new VMware 5.1 U1 Cluster with fresh hardware. New VMs, P2V'ed and old VMs that were moved into the Cluster showed Netlogon/Group Policy/NTP... errors. With VMware limits on CPU/MEM/Disk the errors went away, it clearly looked like a timing issue.

The netlogon debug log showed that the system was missing the IP address (we use static ip settings)

07/25 14:43:59 [SESSION] Winsock Addrs: (0) List is now empty.

With DHCP everything was working without problems.

The ArpRetryCount=0 setting works perfect for me. But I still think it's a workaround, is there any official MS article that describes the problem (timing) and a official solution?

0 Kudos
de2rfg
Enthusiast
Enthusiast

We received an answer from MS about the problem and the ArpRetryCount regkey. They know the problem and there is no offical solution. They also advised us to not change  the ARP parameter because of the negative effects this could have. So we are back to 0...

0 Kudos
TodorIotov
Contributor
Contributor

Here is the closest thing to official fix for this issue - KB938449. I've tested the ExpectedDialupDelay method, but did not worked out, which is unfortunate, since it looks like the best solution so far.

0 Kudos