I just deployed vSAN 7 U2 File Services in my Lab incl. AD config, custom OU (instead of the "computers" container) and everything works fine. I entered the static DNS records into DNS beforehand and checked that both forward and reverse records are all correct (no typos).
File Services is not configured to update DNS itself as the configured AD user (dedicated service account) has no rights to do so.
Everything works fine. NFS shares, SMB shares. The SMB shares can be managed, accessed, the whole AD integration etc. etc. all without any problems what so ever.
Still, the DNS error "One or more DNS server is not reachable or File server IP and FQDN not matching with DNS entries" stubbornly appears in Skyline Health. Everything else is green except the "DNS Lookup" error is in an error-state on all deployed File Services Nodes under "File Server Health" (the "Infrastructure Health" and "Share Health" checks are all green).
Surely I went over everything again. SSH'ed to the VCSA, ran nslookup, forced it to use the DNS servers I configured one-by-one and did forward and reverse tests on all File Services Nodes on all DNS servers. All good. No DNS replication issues. All DNS Servers are reachable, everything resolvable forward and back 100%, no typos.
All components like the ESXi Servers, vCenter, DNS Servers and the bunch of vSAN File Server VM's are all in the same subnet so no firewall issues (which I checked anyway just for good measure).
So what is Skyline health complaining about?
The only thing I noticed is that VCSA does not do short name lookups. It can only do FQDN lookups because VCSA itself has no search-domains (/etc/resolv.conf is devoid of such entries). So for example, it cannot resolve "vSANFS-01" at all but it can resolve "vSANFS-01.mycompany.com" forward and back. But as everything is always configured with FQDN's (I never use short names) this is unlikely the cause of the health-check error.
Is there some detailed error-log that I can dive into to find out why Skyline health keeps complaining about those DNS errors?
Yes the issue has been finally fixed. In our case the issue was related to a mismatch in the DNS records as the records were created with upper case and the vsanfs appliances were configured with the lower case. We had to go through the exercise of removing all DNS records and wait for replication to take place across the board and then recreate the records all with lower case. Once that was done the error was cleared. The engineer that I was working on, specified that this issue will be addressed in a future update so as to avoid similar calls in the future.
The DNS upper-case lower-case thing was also the cause of the problem here. When looking at the vCenter WebGUI, the four File Services Nodes names where in lower-case. In DNS, all where in upper-case. Removing the DNS records and replacing them with lower-case variants solved the problem completely.
For anyone having the same issue: I my case, removing the DNS records in Microsoft AD DNS caused issues: When we re-entered the A records in lower-case, the Microsoft DNS Gui kept showing them in upper-case after a refresh, no matter what we did.
As it turned out, when looking inside the DNS AD Database with Sysinternals ADExplorer, I noticed that the deleted A records, as well as their PTR records, where still all present in the DB, with upper-case, and marked as "tombstoned = yes". We had to delete this "cache" of both A and PTR records and replicate this to the other AD controllers before we where able to enter the A records (+PTR) in the MS DNS Gui and then they would be lower-case and stay that way.
As long as there are those stale cache records in some AD controller somewhere, the old name (upper-case in our case, pun intended) kept coming back and back and back because it replicates right back. Very frustrating so use ADExplorer to check and if needed, delete records you already deleted from the MS DNS Gui or you will be chasing your own tail 😉
Sidenote: Simply put, DNS should be case-insensitive if I'm not mistaken. Regardless, please ask the developer(s) to make the health-check code ignore the case of DNS query results.