I have multiple ESXi 5.5 hosts running a free version of vmware. No vCenter or anything like that. These are at remote sites.
I have these hosts joined to our domain. When we join to the domain, we are able to login with the proper ESXi administrator accounts, however the "Trusted Domains" stays blank.
This works for a few days, but after so long, the login starts to fail for domain accounts with the error. "A general system error occured:"
Manual authentication with an AD account doesn't work either, but local accounts still work. To fix this problem we run the /usr/sbin/services.sh restart command. This fixes the problem for a few days. We can log back in with AD accounts and all is well, but again after a few days it stops working again.
I'd like to figure out why this keeps happening. I've enabled the netlogond.log file and I'm getting entries such as.....
20150727162919:0xffa37b70:INFO:[LWNetSrvGetDCName() /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/dcinfo.c:97] Looking for a DC in domain 'domain.org', site '<null>' with flags 140
20150727162919:0xffa37b70:DEBUG:[LWNetCacheDbQuery() /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/lwnet-cachedb.c:765] Cached entry not found: domain.org, , 1
20150727162919:0xffa37b70:DEBUG:[LWNetSrvGetDCName() /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/dcinfo.c:128] Error at /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/dcinfo.c:128 [code: 1355]
20150727162919:0xffa15b70:INFO:[LWNetSrvGetDCTime() /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/dcinfo.c:434] Determining the current time for domain 'domain.org'
20150727162919:0xffa15b70:INFO:[LWNetSrvGetDCName() /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/dcinfo.c:97] Looking for a DC in domain 'domain.org', site '<null>' with flags 10
20150727162919:0xffa26b70:INFO:[LWNetSrvGetDCName() /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/dcinfo.c:97] Looking for a DC in domain 'domain.org', site '<null>' with flags 0
20150727162924:0xffa15b70:INFO:[LWNetSrvGetDCName() /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/dcinfo.c:97] Looking for a DC in domain 'domain.org', site '<null>' with flags 0
20150727162924:0xffa48b70:INFO:[LWNetSrvGetDCName() /build/mts/release/bora-1471401/likewise/esxi-esxi/src/linux/netlogon/server/api/dcinfo.c:97] Looking for a DC in domain 'domain.org', site '<null>' with flags 0
I've also done numerous tests such as pinging the DC's, verifying FQDN's work, DNS entries, time setup, everything seems correct.
Any idea why this keeps happening? Any additional logging that I can look into?
Are ESXi and DC both syncing time from same NTP server?
I have a feeling this has to do with time difference. Next time it goes out of sync, check the time on both the servers.
With netlogond there are other likewise agents. Try logging them too VMware KB: Enabling logging for Likewise agents on ESXi/ESX
Also, take a look at this convo How to Troubleshoot "Errors in Active Directory Operations" Messages
Yeah all devices are syncing from the same source, I did verify that the times were synced correctly when these issues came up. I'll look at additional logging as well as that article you sent, although that seems to be more for version 6. I'll look into it though.
EDIT: I did enable lwiod.log but I'm not seeing anything super interesting so far.
*bump*
Any other feedback or ideas? Any extra logging I can do?
I've also started to see this problem on Hosts running 5.5.0 build 2718055.
Turned on advanced logging to lwiod, lsassd and netlogond and seeing the same message mentioned in the original post.
Time is being sync'd with the local domain controller.
Have unbound and rebound the host and run /bin/services.sh restart after doing so.
Anyone have ANY other ideas? Kind of a high priority issue.