VMware Cloud Community
CinciTech
Enthusiast
Enthusiast
Jump to solution

ESXi 6.7U1 refusing to join Active Directory Domain

I have two ESXI 6.7U1 vhosts which I would like to connect to a domain; I'm starting with vhost1, (IP=##.##.##.2).  The domain runs on two Windows Server 2016 domain controllers (VMs) named dc1 and dc2, IPs are ##.##.##.10 and ##.##.##.11 respectively.  SMBv1 is disabled on both domain controllers.  I have removed any/all IPv6 on the network as much as I've found, including on the hosts and their VMs.  I don't have any indication that the domain is otherwise nonfunctional, but in the past I have had to re-sync the group policy / SYSVOL shares, so it's not impossible that my problem lies with the DCs somehow.

I have tried to join the domain from both hosts, with identical results.  Joining via the webui, Manage, Authentication, Join Domain results in a message in recent tasks: "failed - The specified domain either does not exist or could not be contacted."  I went to CLI (logged in as root), and I am pasting my interactions which are mostly some copy/paste from various pages I've Googled, including solutions that worked for people and which only appear to return (multiple) errors for me.

[root@vhost1:~] /etc/init.d/lwsmd start

    Starting Likewise Service Manager [memory reservation set] SUCCESS

    [Setting SMBv2 enabled to true]  [starting lsass service] Starting service: lsass

    ...ok

[root@vhost1:~] chkconfig lwsmd on

[root@vhost1:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator

    Joining to AD Domain:   domain.local

    With Computer DNS Name: vhost1.domain.local

   

    administrator@DOMAIN.LOCAL's password:

   

    Error: Lsass Error [code 0x00000043]

    Network name not found.. Failure to lookup a domain name ending in ".local" may be the result of

    configuring the local system's hostname resolution (or equivalent) to use Multi-cast DNS. Please refer

    to the Likewise manual at

    http://www.likewise.com/resources/documentation_library/manuals/open/likewise-open-guide.html#Config...

    witch for more information.

[root@vhost1:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator@domain.local

    Joining to AD Domain:   domain.local

    With Computer DNS Name: vhost1.domain.local

    administrator@DOMAIN.LOCAL's password:

    Error: ERROR_GEN_FAILURE [code 0x0000001f]

[root@vhost1:~] /usr/lib/vmware/likewise/bin/lwsm set-log file /var/log/likewise.log

[root@vhost1:~] /usr/lib/vmware/likewise/bin/lwsm set-log-level info

[root@vhost1:~] /etc/init.d/lwsmd stop

    watchdog-lwsmd: PID file /var/run/vmware/watchdog-lwsmd.PID does not exist

    watchdog-lwsmd: Unable to terminate watchdog: No running watchdog process for lwsmd

    Stopping Likewise Service Manager [memory reservation released] ...failed

[root@vhost1:~] /etc/init.d/lwsmd start

    Starting Likewise Service Manager [memory reservation set] SUCCESS

    [Setting SMBv2 enabled to true]  [starting lsass service] Starting service dependency: netlogon

    Starting service dependency: lwio

    Starting service dependency: rdr

    Starting service: lsass

    ...ok

[root@vhost1:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator

    Joining to AD Domain:   domain.local

    With Computer DNS Name: vhost1.domain.local

   

    administrator@DOMAIN.LOCAL's password:

    Error: DNS_ERROR_BAD_PACKET [code 0x0000251e]

    A bad packet was received from a DNS server. Potentially the requested address does not exist.

[root@vhost1:~] cat /etc/hosts

    # Do not remove the following line, or various programs

    # that require network functionality will fail.

    127.0.0.1      localhost.localdomain localhost

    ::1           localhost.localdomain localhost

    ##.##.##.2    vhost1.domain.local vhost1

[root@vhost1:~] /etc/init.d/lsassd stop

    -sh: /etc/init.d/lsassd: not found

[root@vhost1:~] esxcli network ip dns server list

    DNSServers: ##.##.##.10, ##.##.##.11

[root@vhost1:~] cat /etc/krb5.conf

    [libdefaults]

        default_tgs_enctypes = AES256-CTS AES128-CTS RC4-HMAC

        default_tkt_enctypes = AES256-CTS AES128-CTS RC4-HMAC

        preferred_enctypes = AES256-CTS AES128-CTS RC4-HMAC

        allow_weak_crypto = true

        dns_lookup_kdc = true

        pkinit_kdc_hostname = <DNS>

        pkinit_anchors = DIR:/etc/likewise/trusted_certs

        pkinit_cert_match = <EKU>msScLogin

        pkinit_eku_checking = kpServerAuth

        pkinit_win2k_require_binding = false

        pkinit_identities = PKCS11:/usr/lib/vmware/likewise/lib/libpkcs11wrapper.so.0

        default_realm = DOMAIN.LOCAL

    [likewise]

        disable_modifications = false

        version = 1

    [domain_realm]

        .domain.local = DOMAIN.LOCAL

    [realms]

  DOMAIN.LOCAL = {

    auth_to_local = RULE:[1:$0\$1](^DOMAIN\.LOCAL\\.*)s/^DOMAIN\.LOCAL/DOMAIN/

        auth_to_local = RULE:[1:$0\$1](^DOMAIN\.LOCAL\\.*)s/^DOMAIN\.LOCAL/DOMAIN/

        auth_to_local = DEFAULT

    }

    [appdefaults]

        pam = {

        mappings = DOMAIN\\(.*) $1@DOMAIN.LOCAL

        forwardable = true

        validate = true

        }

        httpd = {

        mappings = DOMAIN\\(.*) $1@DOMAIN.LOCAL

        reverse_mappings = (.*)@DOMAIN\.LOCAL DOMAIN\$1

    }

[root@vhost1:~] ping dc1

    PING dc1 (##.##.##.10): 56 data bytes

    64 bytes from ##.##.##.10: icmp_seq=0 ttl=128 time=0.247 ms

    64 bytes from ##.##.##.10: icmp_seq=1 ttl=128 time=0.631 ms

    --- dc1 ping statistics ---

    2 packets transmitted, 2 packets received, 0% packet loss

    round-trip min/avg/max = 0.247/0.439/0.631 ms

[root@vhost1:~] ping dc2

    PING dc2 (##.##.##.11): 56 data bytes

    64 bytes from ##.##.##.11: icmp_seq=0 ttl=128 time=0.307 ms

    64 bytes from ##.##.##.11: icmp_seq=1 ttl=128 time=0.558 ms

    --- dc2 ping statistics ---

    2 packets transmitted, 2 packets received, 0% packet loss

    round-trip min/avg/max = 0.307/0.432/0.558 ms

Vhost1 is afresh install on new hardware, still running the temporary license.  Vhost2 is more established, licensed as a free ESXi host.  There is no VCSA currently (although I did set one up to show off how nice VMotion/HA is, but that's not part of the Essentials pack so it's cost-prohibitive).  My general question from all this is, what does this tell me I need to troubleshoot next from here?  What more info should I grab to confirm what is and isn't working?

0 Kudos
1 Solution

Accepted Solutions
CinciTech
Enthusiast
Enthusiast
Jump to solution

Just wanted to tie this one up in case anyone else were to have this issue.  I managed to spend many hours trying to troubleshoot my (otherwise-working) domain, retrying with different credentials, DCDIAGing until the cows came home.  Unrelated, ESXi 6.1U1 gave me fits when I set it up to use an NFS share on a NAS as a shared datastore; one VM worked okay, then about the time you got to 2/3 VMs performance just ground to a halt, which I'd never seen before.  And I checked the NAS forums and they blamed the NAS and me for getting the wrong NAS and called out the fact that I didn't have an SSD in it, even though I've never needed that before...

So anyhow, long story short, I recently installed ESXi 6.7U2, and it resolved all my issues with the stupid NAS/NFS datastore thing.  And I just confirmed that both my vhosts are able to join the domain.  If you run into this same issue with ESXi 6.7U1, I recommend you move on to ESXi 6.7U2, because the older version appears to have been the cause of all my headaches.  Bleeding edge technology indeed!

Thanks all who tried to help for your advice and suggestions.  If nothing else, it helped me decide I wasn't just missing something dumb.

---------------------------------------------------------------------------------------------------------

Was it helpful? Let us know by completing this short survey here.

View solution in original post

16 Replies
CinciTech
Enthusiast
Enthusiast
Jump to solution

Still not having any luck with this.  Anyone have any suggestions?  Since my original post I've tried modifying the firewall with no luck, I've confirmed the hostname is vhost1.domain.local (not just vhost1 as I would have expected of a not-yet-joined-to-the-domain PC).  I've confirmed vhost1 can ping and nslookup and netcat the domain controller on port 389.

More info:

[root@vhost1:~] ping dc1

     PING dc1 (##.##.##.10😞 56 data bytes

     64 bytes from ##.##.##.10: icmp_seq=0 ttl=128 time=0.404 ms

     64 bytes from ##.##.##.10: icmp_seq=1 ttl=128 time=0.507 ms

     64 bytes from ##.##.##.10: icmp_seq=2 ttl=128 time=0.457 ms

     --- dc1 ping statistics ---

     3 packets transmitted, 3 packets received, 0% packet loss

     round-trip min/avg/max = 0.404/0.456/0.507 ms

[root@vhost1:~] nslookup dc1

     Server:    ##.##.##.10

     Address 1: ##.##.##.10

     Name:      dc1

     Address 1: ##.##.##.10

[root@vhost1:~] nslookup dc1.domain.local

     Server:    ##.##.##.10

     Address 1: ##.##.##.10

     Name:      dc1.pbs.local

     Address 1: ##.##.##.10

[root@vhost1:~] nc -z dc1 389

     Connection to dc1 389 port [tcp/ldap] succeeded!    

0 Kudos
RickVerstegen
Expert
Expert
Jump to solution

Can you try to join the Hosts via CLI?

VMware Knowledge Base

Was I helpful? Give a kudo for appreciation!
Blog: https://rickverstegen84.wordpress.com/
Twitter: https://twitter.com/verstegenrick
0 Kudos
CinciTech
Enthusiast
Enthusiast
Jump to solution

Yes.  In my OP I pasted the results to "/usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator". 

0 Kudos
CinciTech
Enthusiast
Enthusiast
Jump to solution

More notes: SMB2 is confirmed enabled on the vhost.  Trying domainjoin-cli with an obvious bad domain results in NERR_DCNotFound [code 0x000000995], which I think tells me that I'm communicating to the DC on some level when I don't get that error.  "domainjoin-cli fixfqdn" seems to mostly remove the fqdn from /etc/hosts.  Not totally sure if the fqdn should or should not be in hosts, but neither option seems to be my smoking gun.

[root@vhost1:/etc/init.d] /usr/lib/vmware/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr]'

+  "SMB2Enabled"      REG_DWORD       0x00000001 (1)

   "EchoInterval"     REG_DWORD       0x0000012c (300)

   "EchoTimeout"      REG_DWORD       0x0000000a (10)

   "IdleTimeout"      REG_DWORD       0x0000000a (10)

   "MinCreditReserve" REG_DWORD       0x0000000a (10)

   "Path"             REG_SZ          "/usr/lib/vmware/likewise/lib/librdr.sys.so"

   "ResponseTimeout"  REG_DWORD       0x00000014 (20)

   "SigningEnabled"   REG_DWORD       0x00000001 (1)

   "SigningRequired"  REG_DWORD       0x00000000 (0)

[root@vhost1:/etc/init.d] /usr/lib/vmware/likewise/bin/lwsm restart lwio

     Stopping service reverse dependency: lsass

     Stopping service reverse dependency: rdr

     Stopping service: lwio

     Starting service: lwio

     Starting service reverse dependency: rdr

     Starting service reverse dependency: lsass

[root@vhost1:/etc/init.d] /usr/lib/vmware/likewise/bin/domainjoin-cli join --advanced --preview domain.local administrator

     Joining to AD Domain:   domain.local

     With Computer DNS Name: vhost1.domain.local

     [X] [N] join           - join computer to AD

     [X] [N] krb5           - configure krb5.conf

     [X] [N] cache          - manage caches for this host

This last bit of status I've run multiple times, and once the krb5 returned [X] [S], so it thought that was correct at one point in my troubleshooting steps.  I guess I'll do some digging into krb5.conf and see if anything stands out as misconfigured there.

0 Kudos
CinciTech
Enthusiast
Enthusiast
Jump to solution

Any thoughts?  I'm still scratching at a whole lot of nothing with this.

0 Kudos
MikeStoica
Expert
Expert
Jump to solution

Are you following these steps Configure a Host to Use Active Directory  ?

0 Kudos
vGuy
Expert
Expert
Jump to solution

run the below commands:

esxcli network firewall ruleset set -r activeDirectoryAll --enabled 1

esxcli network ip dns search add –d domain.local

esxcli system hostname set --domain=domain.local

and try to join again using /usr/lib/vmware/likewise/bin/domainjoin-cli

see...if it helps

sk84
Expert
Expert
Jump to solution

At first, check the time settings between your ESXi hosts and domain controllers.

Then set the DNS servers for your ESXi hosts to the DNS servers of your domain.

And if any firewalls are in between the ESXi hosts and your domain controllers and domain dns server, check that all necessary ports are open:

Active Directory and Active Directory Domain Services Port Requirements | Microsoft Docs

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
CinciTech
Enthusiast
Enthusiast
Jump to solution

The domain controllers are VMs on the host, and they're set to sync time via VMWare Tools.  They appear to have identical times.

The ESXi hosts are using the correct DNS server IPs.  These settings are set via DHCP static assignment.  I even set the IP/DNS settings to static on the domain controller (which are also the DNS servers) and that didn't appear to change anything.

I've disabled the ESXi firewall and all firewalls on the DC/DNS servers to be sure that isn't the problem, and I've checked ports are open using netcat ("nc -z dc1 389" above responded that it was successful; other ports in the usual suspects list also connected without issue: 53, 88, etc)

0 Kudos
CinciTech
Enthusiast
Enthusiast
Jump to solution

[root@vhost1:~] esxcli network firewall ruleset set -r activeDirectoryAll --enabled 1

[root@vhost1:~] esxcli network ip dns search add -d domain.local

[root@vhost1:~] esxcli system hostname set --domain=domain.local

[root@vhost1:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator

     Joining to AD Domain:   domain.local

     With Computer DNS Name: vhost1.domain.local

     administrator@DOMAIN.LOCAL's password:

     Error: Lsass Error [code 0x00000043]

     Network name not found.. Failure to lookup a domain name ending in ".local" may be the result of

     configuring the local system's hostname resolution (or equivalent) to use Multi-cast DNS. Please refer

     to the Likewise manual at

     http://www.likewise.com/resources/documentation_library/manuals/open/likewise-open-guide.html#Config...

     witch for more information.

Same Lsass error I was getting in the OP.  I have Googled the Multi-cast DNS error and don't beleive any of what is described applies to my situation.

0 Kudos
CinciTech
Enthusiast
Enthusiast
Jump to solution

Yes.  And I've joined dozens of ESXi hosts to domains, so I'm really at a loss for what makes this one so unique.

0 Kudos
sk84
Expert
Expert
Jump to solution

Error: Lsass Error [code 0x00000043]

This error code comes from the domain controller. VMware doesn't use these error codes. So, something must be wrong on ADS/PDC site or the domain controller don't like a value supplied by the ESXi host during the domain join.

And, in fact, the windows error code 0x00000043 is "Network name cannot be found".

Can you please try the following to check if DNS is really working as expected:

On one ESXi host:

nslookup domain.local

It should return your domain controller(s).

On the domain controller(s):

nslookup vhost1.domain.local

It should return the IP address of your ESXi host and the responding DNS server should be localhost (127.0.0.1) if your domain controller(s) have also installed the DNS role.

And how do you supply the admin credentials for the domain join?

Have you tried "administrator@domain.local" instead of "domain\administrator" or "domain.local\administrator"? Strangely enough, I have had problems with this notation in the past.

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
CinciTech
Enthusiast
Enthusiast
Jump to solution

Here's a fresh run of nslookup (but it matches the OP version):

[root@vhost1:~] nslookup domain.local

     Server:    ##.##.##.10

     Address 1: ##.##.##.10 dc1.domain.local

     Name:      domain.local

     Address 1: ##.##.##.10 dc1.domain.local

     Address 2: ##.##.##.11 dc2.domain.local

[root@vhost1:~] nslookup vhost1.domain.local

     Server:    ##.##.##.10

     Address 1: ##.##.##.10 dc1.domain.local

     Name:      vhost1.domain.local

     Address 1: ##.##.##.2 vhost1.domain.local

I have tried entering the account name as "administrator", "DOMAIN\Administrator" and "administrator@domain.local".  In the past, if the syntax of the domain account has been an issue (where it's usually fixable in DHCP options) it will return "administrator" as "administrator".  In the OP I posted where I used "administrator" and it returned "administrator@DOMAIN.LOCAL", so I presume it's got some level of understanding of what domain I want it to talk to!  ...just not enough to actually talk.

That the error code is coming from the dc is news to me.  Perhaps I need to check the DNS record for the vhosts?  I know both vhost1 and vhost2 have existed in DNS for a long time, but just in case I deleted and re-added them to be sure nothing's goofy, then retried the join.  Same Lsass error.  -_-

Edit: I missed where you said to nslookup the vhost from the dc.  Here's that output:

Microsoft Windows [Version 10.0.14393]

(c) 2016 Microsoft Corporation. All rights reserved.

U:\>nslookup domain.local

     Server:  dc1.domain.local

     Address:  ##.##.##.10

     Name:    domain.local

     Addresses:  ##.##.##.11

     ##.##.##.10

U:\>nslookup vhost1.domain.local

     Server:  dc1.domain.local

     Address:  ##.##.##.10

     Name:    vhost1.domain.local

     Address:  ##.##.##.2

0 Kudos
sk84
Expert
Expert
Jump to solution

At this point I don't have any more ideas what could be the reason for this error.

That the error code is coming from the dc is news to me.

Yes. The error comes from the domain controller. But we don't know if it's a failure or misconfiguration in the domain controller itself or if it only reacts to an unforeseen value from the ESXi host.

Maybe an analysis with the tool "dcdiag" will get you somewhere. It could help to check the general health of the domain controller: Dcdiag | Microsoft Docs

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
chaplina
Enthusiast
Enthusiast
Jump to solution

SMB1 may be required to do the initial join.

0 Kudos
CinciTech
Enthusiast
Enthusiast
Jump to solution

Just wanted to tie this one up in case anyone else were to have this issue.  I managed to spend many hours trying to troubleshoot my (otherwise-working) domain, retrying with different credentials, DCDIAGing until the cows came home.  Unrelated, ESXi 6.1U1 gave me fits when I set it up to use an NFS share on a NAS as a shared datastore; one VM worked okay, then about the time you got to 2/3 VMs performance just ground to a halt, which I'd never seen before.  And I checked the NAS forums and they blamed the NAS and me for getting the wrong NAS and called out the fact that I didn't have an SSD in it, even though I've never needed that before...

So anyhow, long story short, I recently installed ESXi 6.7U2, and it resolved all my issues with the stupid NAS/NFS datastore thing.  And I just confirmed that both my vhosts are able to join the domain.  If you run into this same issue with ESXi 6.7U1, I recommend you move on to ESXi 6.7U2, because the older version appears to have been the cause of all my headaches.  Bleeding edge technology indeed!

Thanks all who tried to help for your advice and suggestions.  If nothing else, it helped me decide I wasn't just missing something dumb.

---------------------------------------------------------------------------------------------------------

Was it helpful? Let us know by completing this short survey here.