I have two ESXI 6.7U1 vhosts which I would like to connect to a domain; I'm starting with vhost1, (IP=##.##.##.2). The domain runs on two Windows Server 2016 domain controllers (VMs) named dc1 and dc2, IPs are ##.##.##.10 and ##.##.##.11 respectively. SMBv1 is disabled on both domain controllers. I have removed any/all IPv6 on the network as much as I've found, including on the hosts and their VMs. I don't have any indication that the domain is otherwise nonfunctional, but in the past I have had to re-sync the group policy / SYSVOL shares, so it's not impossible that my problem lies with the DCs somehow.
I have tried to join the domain from both hosts, with identical results. Joining via the webui, Manage, Authentication, Join Domain results in a message in recent tasks: "failed - The specified domain either does not exist or could not be contacted." I went to CLI (logged in as root), and I am pasting my interactions which are mostly some copy/paste from various pages I've Googled, including solutions that worked for people and which only appear to return (multiple) errors for me.
[root@vhost1:~] /etc/init.d/lwsmd start
Starting Likewise Service Manager [memory reservation set] SUCCESS
[Setting SMBv2 enabled to true] [starting lsass service] Starting service: lsass
...ok
[root@vhost1:~] chkconfig lwsmd on
[root@vhost1:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator
Joining to AD Domain: domain.local
With Computer DNS Name: vhost1.domain.local
administrator@DOMAIN.LOCAL's password:
Error: Lsass Error [code 0x00000043]
Network name not found.. Failure to lookup a domain name ending in ".local" may be the result of
configuring the local system's hostname resolution (or equivalent) to use Multi-cast DNS. Please refer
to the Likewise manual at
witch for more information.
[root@vhost1:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator@domain.local
Joining to AD Domain: domain.local
With Computer DNS Name: vhost1.domain.local
administrator@DOMAIN.LOCAL's password:
Error: ERROR_GEN_FAILURE [code 0x0000001f]
[root@vhost1:~] /usr/lib/vmware/likewise/bin/lwsm set-log file /var/log/likewise.log
[root@vhost1:~] /usr/lib/vmware/likewise/bin/lwsm set-log-level info
[root@vhost1:~] /etc/init.d/lwsmd stop
watchdog-lwsmd: PID file /var/run/vmware/watchdog-lwsmd.PID does not exist
watchdog-lwsmd: Unable to terminate watchdog: No running watchdog process for lwsmd
Stopping Likewise Service Manager [memory reservation released] ...failed
[root@vhost1:~] /etc/init.d/lwsmd start
Starting Likewise Service Manager [memory reservation set] SUCCESS
[Setting SMBv2 enabled to true] [starting lsass service] Starting service dependency: netlogon
Starting service dependency: lwio
Starting service dependency: rdr
Starting service: lsass
...ok
[root@vhost1:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator
Joining to AD Domain: domain.local
With Computer DNS Name: vhost1.domain.local
administrator@DOMAIN.LOCAL's password:
Error: DNS_ERROR_BAD_PACKET [code 0x0000251e]
A bad packet was received from a DNS server. Potentially the requested address does not exist.
[root@vhost1:~] cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
##.##.##.2 vhost1.domain.local vhost1
[root@vhost1:~] /etc/init.d/lsassd stop
-sh: /etc/init.d/lsassd: not found
[root@vhost1:~] esxcli network ip dns server list
DNSServers: ##.##.##.10, ##.##.##.11
[root@vhost1:~] cat /etc/krb5.conf
[libdefaults]
default_tgs_enctypes = AES256-CTS AES128-CTS RC4-HMAC
default_tkt_enctypes = AES256-CTS AES128-CTS RC4-HMAC
preferred_enctypes = AES256-CTS AES128-CTS RC4-HMAC
allow_weak_crypto = true
dns_lookup_kdc = true
pkinit_kdc_hostname = <DNS>
pkinit_anchors = DIR:/etc/likewise/trusted_certs
pkinit_cert_match = <EKU>msScLogin
pkinit_eku_checking = kpServerAuth
pkinit_win2k_require_binding = false
pkinit_identities = PKCS11:/usr/lib/vmware/likewise/lib/libpkcs11wrapper.so.0
default_realm = DOMAIN.LOCAL
[likewise]
disable_modifications = false
version = 1
[domain_realm]
.domain.local = DOMAIN.LOCAL
[realms]
DOMAIN.LOCAL = {
auth_to_local = RULE:[1:$0\$1](^DOMAIN\.LOCAL\\.*)s/^DOMAIN\.LOCAL/DOMAIN/
auth_to_local = RULE:[1:$0\$1](^DOMAIN\.LOCAL\\.*)s/^DOMAIN\.LOCAL/DOMAIN/
auth_to_local = DEFAULT
}
[appdefaults]
pam = {
mappings = DOMAIN\\(.*) $1@DOMAIN.LOCAL
forwardable = true
validate = true
}
httpd = {
mappings = DOMAIN\\(.*) $1@DOMAIN.LOCAL
reverse_mappings = (.*)@DOMAIN\.LOCAL DOMAIN\$1
}
[root@vhost1:~] ping dc1
PING dc1 (##.##.##.10): 56 data bytes
64 bytes from ##.##.##.10: icmp_seq=0 ttl=128 time=0.247 ms
64 bytes from ##.##.##.10: icmp_seq=1 ttl=128 time=0.631 ms
--- dc1 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.247/0.439/0.631 ms
[root@vhost1:~] ping dc2
PING dc2 (##.##.##.11): 56 data bytes
64 bytes from ##.##.##.11: icmp_seq=0 ttl=128 time=0.307 ms
64 bytes from ##.##.##.11: icmp_seq=1 ttl=128 time=0.558 ms
--- dc2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.307/0.432/0.558 ms
Vhost1 is afresh install on new hardware, still running the temporary license. Vhost2 is more established, licensed as a free ESXi host. There is no VCSA currently (although I did set one up to show off how nice VMotion/HA is, but that's not part of the Essentials pack so it's cost-prohibitive). My general question from all this is, what does this tell me I need to troubleshoot next from here? What more info should I grab to confirm what is and isn't working?
Just wanted to tie this one up in case anyone else were to have this issue. I managed to spend many hours trying to troubleshoot my (otherwise-working) domain, retrying with different credentials, DCDIAGing until the cows came home. Unrelated, ESXi 6.1U1 gave me fits when I set it up to use an NFS share on a NAS as a shared datastore; one VM worked okay, then about the time you got to 2/3 VMs performance just ground to a halt, which I'd never seen before. And I checked the NAS forums and they blamed the NAS and me for getting the wrong NAS and called out the fact that I didn't have an SSD in it, even though I've never needed that before...
So anyhow, long story short, I recently installed ESXi 6.7U2, and it resolved all my issues with the stupid NAS/NFS datastore thing. And I just confirmed that both my vhosts are able to join the domain. If you run into this same issue with ESXi 6.7U1, I recommend you move on to ESXi 6.7U2, because the older version appears to have been the cause of all my headaches. Bleeding edge technology indeed!
Thanks all who tried to help for your advice and suggestions. If nothing else, it helped me decide I wasn't just missing something dumb.
---------------------------------------------------------------------------------------------------------
Was it helpful? Let us know by completing this short survey here.
Still not having any luck with this. Anyone have any suggestions? Since my original post I've tried modifying the firewall with no luck, I've confirmed the hostname is vhost1.domain.local (not just vhost1 as I would have expected of a not-yet-joined-to-the-domain PC). I've confirmed vhost1 can ping and nslookup and netcat the domain controller on port 389.
More info:
[root@vhost1:~] ping dc1
PING dc1 (##.##.##.10😞 56 data bytes
64 bytes from ##.##.##.10: icmp_seq=0 ttl=128 time=0.404 ms
64 bytes from ##.##.##.10: icmp_seq=1 ttl=128 time=0.507 ms
64 bytes from ##.##.##.10: icmp_seq=2 ttl=128 time=0.457 ms
--- dc1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.404/0.456/0.507 ms
[root@vhost1:~] nslookup dc1
Server: ##.##.##.10
Address 1: ##.##.##.10
Name: dc1
Address 1: ##.##.##.10
[root@vhost1:~] nslookup dc1.domain.local
Server: ##.##.##.10
Address 1: ##.##.##.10
Name: dc1.pbs.local
Address 1: ##.##.##.10
[root@vhost1:~] nc -z dc1 389
Connection to dc1 389 port [tcp/ldap] succeeded!
Can you try to join the Hosts via CLI?
Yes. In my OP I pasted the results to "/usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator".
More notes: SMB2 is confirmed enabled on the vhost. Trying domainjoin-cli with an obvious bad domain results in NERR_DCNotFound [code 0x000000995], which I think tells me that I'm communicating to the DC on some level when I don't get that error. "domainjoin-cli fixfqdn" seems to mostly remove the fqdn from /etc/hosts. Not totally sure if the fqdn should or should not be in hosts, but neither option seems to be my smoking gun.
[root@vhost1:/etc/init.d] /usr/lib/vmware/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr]'
+ "SMB2Enabled" REG_DWORD 0x00000001 (1)
"EchoInterval" REG_DWORD 0x0000012c (300)
"EchoTimeout" REG_DWORD 0x0000000a (10)
"IdleTimeout" REG_DWORD 0x0000000a (10)
"MinCreditReserve" REG_DWORD 0x0000000a (10)
"Path" REG_SZ "/usr/lib/vmware/likewise/lib/librdr.sys.so"
"ResponseTimeout" REG_DWORD 0x00000014 (20)
"SigningEnabled" REG_DWORD 0x00000001 (1)
"SigningRequired" REG_DWORD 0x00000000 (0)
[root@vhost1:/etc/init.d] /usr/lib/vmware/likewise/bin/lwsm restart lwio
Stopping service reverse dependency: lsass
Stopping service reverse dependency: rdr
Stopping service: lwio
Starting service: lwio
Starting service reverse dependency: rdr
Starting service reverse dependency: lsass
[root@vhost1:/etc/init.d] /usr/lib/vmware/likewise/bin/domainjoin-cli join --advanced --preview domain.local administrator
Joining to AD Domain: domain.local
With Computer DNS Name: vhost1.domain.local
[X] [N] join - join computer to AD
[X] [N] krb5 - configure krb5.conf
[X] [N] cache - manage caches for this host
This last bit of status I've run multiple times, and once the krb5 returned [X] [S], so it thought that was correct at one point in my troubleshooting steps. I guess I'll do some digging into krb5.conf and see if anything stands out as misconfigured there.
Any thoughts? I'm still scratching at a whole lot of nothing with this.
Are you following these steps Configure a Host to Use Active Directory ?
At first, check the time settings between your ESXi hosts and domain controllers.
Then set the DNS servers for your ESXi hosts to the DNS servers of your domain.
And if any firewalls are in between the ESXi hosts and your domain controllers and domain dns server, check that all necessary ports are open:
Active Directory and Active Directory Domain Services Port Requirements | Microsoft Docs
The domain controllers are VMs on the host, and they're set to sync time via VMWare Tools. They appear to have identical times.
The ESXi hosts are using the correct DNS server IPs. These settings are set via DHCP static assignment. I even set the IP/DNS settings to static on the domain controller (which are also the DNS servers) and that didn't appear to change anything.
I've disabled the ESXi firewall and all firewalls on the DC/DNS servers to be sure that isn't the problem, and I've checked ports are open using netcat ("nc -z dc1 389" above responded that it was successful; other ports in the usual suspects list also connected without issue: 53, 88, etc)
[root@vhost1:~] esxcli network firewall ruleset set -r activeDirectoryAll --enabled 1
[root@vhost1:~] esxcli network ip dns search add -d domain.local
[root@vhost1:~] esxcli system hostname set --domain=domain.local
[root@vhost1:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join domain.local administrator
Joining to AD Domain: domain.local
With Computer DNS Name: vhost1.domain.local
administrator@DOMAIN.LOCAL's password:
Error: Lsass Error [code 0x00000043]
Network name not found.. Failure to lookup a domain name ending in ".local" may be the result of
configuring the local system's hostname resolution (or equivalent) to use Multi-cast DNS. Please refer
to the Likewise manual at
witch for more information.
Same Lsass error I was getting in the OP. I have Googled the Multi-cast DNS error and don't beleive any of what is described applies to my situation.
Yes. And I've joined dozens of ESXi hosts to domains, so I'm really at a loss for what makes this one so unique.
Error: Lsass Error [code 0x00000043]
This error code comes from the domain controller. VMware doesn't use these error codes. So, something must be wrong on ADS/PDC site or the domain controller don't like a value supplied by the ESXi host during the domain join.
And, in fact, the windows error code 0x00000043 is "Network name cannot be found".
Can you please try the following to check if DNS is really working as expected:
On one ESXi host:
nslookup domain.local
It should return your domain controller(s).
On the domain controller(s):
nslookup vhost1.domain.local
It should return the IP address of your ESXi host and the responding DNS server should be localhost (127.0.0.1) if your domain controller(s) have also installed the DNS role.
And how do you supply the admin credentials for the domain join?
Have you tried "administrator@domain.local" instead of "domain\administrator" or "domain.local\administrator"? Strangely enough, I have had problems with this notation in the past.
Here's a fresh run of nslookup (but it matches the OP version):
[root@vhost1:~] nslookup domain.local
Server: ##.##.##.10
Address 1: ##.##.##.10 dc1.domain.local
Name: domain.local
Address 1: ##.##.##.10 dc1.domain.local
Address 2: ##.##.##.11 dc2.domain.local
[root@vhost1:~] nslookup vhost1.domain.local
Server: ##.##.##.10
Address 1: ##.##.##.10 dc1.domain.local
Name: vhost1.domain.local
Address 1: ##.##.##.2 vhost1.domain.local
I have tried entering the account name as "administrator", "DOMAIN\Administrator" and "administrator@domain.local". In the past, if the syntax of the domain account has been an issue (where it's usually fixable in DHCP options) it will return "administrator" as "administrator". In the OP I posted where I used "administrator" and it returned "administrator@DOMAIN.LOCAL", so I presume it's got some level of understanding of what domain I want it to talk to! ...just not enough to actually talk.
That the error code is coming from the dc is news to me. Perhaps I need to check the DNS record for the vhosts? I know both vhost1 and vhost2 have existed in DNS for a long time, but just in case I deleted and re-added them to be sure nothing's goofy, then retried the join. Same Lsass error. -_-
Edit: I missed where you said to nslookup the vhost from the dc. Here's that output:
Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.
U:\>nslookup domain.local
Server: dc1.domain.local
Address: ##.##.##.10
Name: domain.local
Addresses: ##.##.##.11
##.##.##.10
U:\>nslookup vhost1.domain.local
Server: dc1.domain.local
Address: ##.##.##.10
Name: vhost1.domain.local
Address: ##.##.##.2
At this point I don't have any more ideas what could be the reason for this error.
That the error code is coming from the dc is news to me.
Yes. The error comes from the domain controller. But we don't know if it's a failure or misconfiguration in the domain controller itself or if it only reacts to an unforeseen value from the ESXi host.
Maybe an analysis with the tool "dcdiag" will get you somewhere. It could help to check the general health of the domain controller: Dcdiag | Microsoft Docs
SMB1 may be required to do the initial join.
Just wanted to tie this one up in case anyone else were to have this issue. I managed to spend many hours trying to troubleshoot my (otherwise-working) domain, retrying with different credentials, DCDIAGing until the cows came home. Unrelated, ESXi 6.1U1 gave me fits when I set it up to use an NFS share on a NAS as a shared datastore; one VM worked okay, then about the time you got to 2/3 VMs performance just ground to a halt, which I'd never seen before. And I checked the NAS forums and they blamed the NAS and me for getting the wrong NAS and called out the fact that I didn't have an SSD in it, even though I've never needed that before...
So anyhow, long story short, I recently installed ESXi 6.7U2, and it resolved all my issues with the stupid NAS/NFS datastore thing. And I just confirmed that both my vhosts are able to join the domain. If you run into this same issue with ESXi 6.7U1, I recommend you move on to ESXi 6.7U2, because the older version appears to have been the cause of all my headaches. Bleeding edge technology indeed!
Thanks all who tried to help for your advice and suggestions. If nothing else, it helped me decide I wasn't just missing something dumb.
---------------------------------------------------------------------------------------------------------
Was it helpful? Let us know by completing this short survey here.