VMware Cloud Community
usafseic
Enthusiast
Enthusiast
Jump to solution

How to Troubleshoot "Errors in Active Directory Operations" Messages

Am getting this message while trying to join an ESXi 6 host to a domain.  I see a lot of KB articles, tips, forum entries, etc. on how to solve specific problems, but is there some recommended place to start with the log files on the host that will give me the best information to get to one of those "specific problems?"

1 Solution

Accepted Solutions
usafseic
Enthusiast
Enthusiast
Jump to solution

So here's what may be the problem.  VMware support asked for the Likewise and ESXi logs again, so I went back to the KB article that discusses how to set up logging for the Likewise agent (1026554) to jog my memory on getting that configured.  What I found was that the article had been updated this week (6/2/15).  Under the ESXi 6.0 section, it looks like there is now a new step that says "Start the lwsmd service by running this command:

/etc/init.d/lwsmd start

I don't recall ever taking that action before when I went through this process, but lo and behold, after starting that daemon, I was able to join all domains that it had failed on previously.  (Or it appears that the daemon wasn't started from the messages I got when I ran the command.)  So this service condition might be a known issue that has to be corrected in a future patch, where either the service should be set to run on startup or to start up and stay on whenever a domain join is requested.

Furthermore, there's another article that says that you need to set that service to start automatically using

chkconfig lwsmd on

which may have been the root cause of why it wouldn't start whenever I rebooted the host.  Being that that's a very low level service, I wouldn't have had any idea it was running or not.

Now this whole thing might be completely off the mark and may or may not work for anyone else, but I can definitely say that the host joined after I manually ran that service startup script on the host, after not being able to join via many other troubleshooting actions.

View solution in original post

12 Replies
Dee006
Hot Shot
Hot Shot
Jump to solution

May I know the what is the user credential format you are using while adding the host to domain and all required ports are open in your environment?

Reply
0 Kudos
usafseic
Enthusiast
Enthusiast
Jump to solution

The UPN format (user@do.main.com) and yes I know the NETBIOS-style reference (DO-MAIN\user) doesn't work.

Active Directory service is running and the firewall is in its default configuration with the "Active Directory All" item checked (88,123,137,139,389,445,464,3268,51915 outbound).

Reply
0 Kudos
Dee006
Hot Shot
Hot Shot
Jump to solution

Cool,To be Frank I didn't add my test environment with AD.May be I should try.Smiley WinkLemme see if I come across similar issues.

Reply
0 Kudos
vJeff
VMware Employee
VMware Employee
Jump to solution

usafseic,

    I have been spending a lot of time troubleshooting this for a large customer where we have nearly 500 hosts to get joined to the domain.  Here are some of the things I have had to do and check to get it working.

    First of all see this article for enabling logging for the Likewise agents.  These are the log files you can review, however they haven't been very helpful for me.  http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102655...

    One thing I've noticed is that I am much more successful joining the domain immediately after a fresh reboot.  My new process is to reboot the host, join the domain, then reboot again. If I let the host sit too long after a reboot before joining the domain I get the same error as you do. That has solved 80% of my issues so far.  It is not the most ideal, but it seems to work.

  

     Check that you have both the proper host name and domain name in the host's DNS and Routing configuration.  If the name is wrong or the domain is missing it has failed for me.

     In some rare instances I had to set a preferred domain controller because it was trying to authenticate to a DC that was in a location blocked by a firewall.  In the host's advanced configuration -> UserVars you can enter the name of the preferred domain controller for the setting UserVars.ActiveDirectoryPreferredDomainControllers.   This has helped on a couple of occasions where there wasn't a local functioning domain controller.

usafseic
Enthusiast
Enthusiast
Jump to solution

Wow, thanks for that info.  Here's what the log file is showing

20150429210955:ERROR:lsass: Failed to run provider specific request (request code = 12, provider = 'lsa-activedirectory-provider') -> error = 2692, symbol = NERR_SetupNotJoined, client pid = 37545

20150429211010:ERROR:lsass: Failed to run provider specific request (request code = 12, provider = 'lsa-activedirectory-provider') -> error = 2692, symbol = NERR_SetupNotJoined, client pid = 37582

20150429211015:ERROR:lsass: Failed to run provider specific request (request code = 8, provider = 'lsa-activedirectory-provider') -> error = 40286, symbol = LW_ERROR_LDAP_SERVER_DOWN, client pid = 34807

20150429211017:ERROR:lsass: Failed to run provider specific request (request code = 12, provider = 'lsa-activedirectory-provider') -> error = 2692, symbol = NERR_SetupNotJoined, client pid = 37604

LW_ERROR_LDAP_SERVER_DOWN doesn't make much sense because it can resolve the hostname.

Expanding that specific error, I see

20150429211510:INFO:netlogon: Looking for a DC in domain 'XXX.YYY.COM', site '<null>' with flags 10

20150429211510:INFO:netlogon: Determining the current time for domain 'XXX.YYY.COM'

20150429211510:INFO:netlogon: Looking for a DC in domain 'XXX.YYY.COM', site '<null>' with flags 10

20150429211511:INFO:netlogon: Looking for a DC in domain 'XXX.YYY.COM', site '<null>' with flags 1001

20150429211511:INFO:netlogon: Filtering list of 9 servers with list of 0 black listed servers

20150429211511:INFO:netlogon: Filtering list of 5 servers with list of 0 black listed servers

20150429211515:ERROR:lsass: Failed to run provider specific request (request code = 8, provider = 'lsa-activedirectory-provider') -> error = 40286, symbol = LW_ERROR_LDAP_SERVER_DOWN, client pid = 34582

and then running as VERBOSE, there's some GSS-API error

20150429211753:INFO:netlogon: Looking for a DC in domain 'XXX.YYY.COM', site '<null>' with flags 10

20150429211753:VERBOSE:lsass: Affinitized to DC 'XXX.YYY.ZZZ.com' for join request to domain 'XXX.YYY.COM'

20150429211753:VERBOSE:lwreg: Registry::sqldb.c RegDbOpenKey() finished

20150429211753:VERBOSE:lwreg: Registry::sqldb.c SqliteGetValueAttributes_Internal() finished

20150429211753:VERBOSE:lwreg: Registry::sqldb.c RegDbOpenKey() finished

20150429211753:VERBOSE:lwreg: Registry::sqldb.c SqliteGetValueAttributes_Internal() finished

20150429211753:INFO:netlogon: Determining the current time for domain 'XXX.YYY.COM'

20150429211753:INFO:netlogon: Looking for a DC in domain 'XXX.YYY.COM', site '<null>' with flags 10

20150429211753:INFO:netlogon: Looking for a DC in domain 'XXX.YYY.COM', site '<null>' with flags 1001

20150429211753:INFO:netlogon: Filtering list of 9 servers with list of 0 black listed servers

20150429211753:INFO:netlogon: Filtering list of 5 servers with list of 0 black listed servers

20150429211753:VERBOSE:lwio: GSS-API error calling gss_init_sec_context: 1 (The routine must be called again to complete its function)

20150429211755:VERBOSE:lsass: Permission granted for (uid = 0, gid = 0, pid = 38118) to open LsaIpcServer

20150429211755:VERBOSE:lsass-ipc: (session:e8153487dc6055af-a9956bd6be374f20) Accepted association 0x3d410b40

20150429211755:VERBOSE:lwreg: Registry::sqldb.c RegDbOpenKey() finished

20150429211755:VERBOSE:lwreg: Registry::sqldb.c SqliteGetValueAttributes_Internal() finished

20150429211755:ERROR:lsass: Failed to run provider specific request (request code = 12, provider = 'lsa-activedirectory-provider') -> error = 2692, symbol = NERR_SetupNotJoined, client pid = 38118

20150429211755:VERBOSE:lsass-ipc: (assoc:0x3d410b40) Dropping: Connection closed by peer

20150429211758:ERROR:lsass: Failed to run provider specific request (request code = 8, provider = 'lsa-activedirectory-provider') -> error = 40286, symbol = LW_ERROR_LDAP_SERVER_DOWN, client pid = 34581

20150429211758:VERBOSE:lsass-ipc: (assoc:0x3d412618) Dropping: Connection closed by peer

and finally at the bottom level I see

20150429212849:DEBUG:lsass:KtLdapQuery():ktldap.c:149: Ldap error code: 4294967295

20150429212849:DEBUG:lsass:KtLdapGetBaseDnA():ktldap.c:258: Error code: 40286 (symbol: LW_ERROR_LDAP_SERVER_DOWN)

20150429212849:DEBUG:lsass:KtLdapGetBaseDnW():ktldap.c:295: Error code: 40286 (symbol: LW_ERROR_LDAP_SERVER_DOWN)

20150429212849:DEBUG:lsass:LsaSaveMachinePassword():join.c:2043: Error code: 40286 (symbol: LW_ERROR_LDAP_SERVER_DOWN)

20150429212849:DEBUG:lsass:LsaJoinDomainInternal():join.c:778: Error code: 40286 (symbol: LW_ERROR_LDAP_SERVER_DOWN)

I also tried the domainjoin-cli command, and it returns "The DC closed an LDAP connection in the middle of a query" and LW_ERROR_LDAP_CONSTRAINT_VIOLATION [code 0x00009d7b]

So I'm opening a ticket with our domain admins to see if they maybe have the object permissions messed up or to see if something is coming up on the back end.

Reply
0 Kudos
usafseic
Enthusiast
Enthusiast
Jump to solution

So here's what may be the problem.  VMware support asked for the Likewise and ESXi logs again, so I went back to the KB article that discusses how to set up logging for the Likewise agent (1026554) to jog my memory on getting that configured.  What I found was that the article had been updated this week (6/2/15).  Under the ESXi 6.0 section, it looks like there is now a new step that says "Start the lwsmd service by running this command:

/etc/init.d/lwsmd start

I don't recall ever taking that action before when I went through this process, but lo and behold, after starting that daemon, I was able to join all domains that it had failed on previously.  (Or it appears that the daemon wasn't started from the messages I got when I ran the command.)  So this service condition might be a known issue that has to be corrected in a future patch, where either the service should be set to run on startup or to start up and stay on whenever a domain join is requested.

Furthermore, there's another article that says that you need to set that service to start automatically using

chkconfig lwsmd on

which may have been the root cause of why it wouldn't start whenever I rebooted the host.  Being that that's a very low level service, I wouldn't have had any idea it was running or not.

Now this whole thing might be completely off the mark and may or may not work for anyone else, but I can definitely say that the host joined after I manually ran that service startup script on the host, after not being able to join via many other troubleshooting actions.

VIR2AL3X
Enthusiast
Enthusiast
Jump to solution

I am currently having this same issue and am unable to resolve it using the method provided.  Gladly welcome some assistance to get this working in my nested lab setup.

Reply
0 Kudos
CNI0
Enthusiast
Enthusiast
Jump to solution

Reply
0 Kudos
Shamsher0
Contributor
Contributor
Jump to solution

Hi, please your username as "username@yrdomain.com" while join domain on esxi server. Good Luck Smiley Happy

Reply
0 Kudos
ibunne
Contributor
Contributor
Jump to solution

Hi, for anyone still having this problem (i.e. cant join ESXi host to AD nor the VCSA) this post solved it for me Error joining vCenter Server Appliance to Active Directory » VCDX56 .

"You need to enable SMB version 1 in Windows Server 2012/2012 R2 from the registry".

Hope this helps anyone. Was a real pain to find.

Reply
0 Kudos
kevinstiegler
VMware Employee
VMware Employee
Jump to solution

Our site attempted joining the domain after enabling SMBv2 and the process continued to fail.  After reversing the process and setting SMB2Enabled to '0', the lsass service fails to start. Can anyone verify that simply changing the "1" to "0" turns it off?

Reply
0 Kudos
GBartsch
Enthusiast
Enthusiast
Jump to solution

Folks,

Regarding turning on SMB v1 on and Active Directory controller....

This is a BAD idea.

It is also not something that we recommend.  The reason for it is as simple as EternalBlue.

Malware is known to use SMBv1 and Microsoft has officially recommended that it be turned off.

If you have an issue where you cannot join an ESXi host or vCenter to AD without SMBv1, please contact VMware GSS.

Reply
0 Kudos