DCs and DNS were down at the time - but I expected to be able to login as root, as per this excerpt from ‘Enabling Active Directory Authentication with ESX Server':
Q. What if my ESX Server computer cannot contact a domain controller?
A. By preserving root access to the local service console, authorized personnel can always log on as root, even if contact with the domain is lost.
You do this by making sure AD or Kerberos is NOT affecting any account with a uid < 500. It sounds like you have it affecting all accounts which will lead to the chicken and the egg lockout you are seeing.
In /etc/pam.d/system-auth you would use a line similar to
Thanks Edward, that is similar to what Andrea recommended. I will try that.
The VMware document 'Enabling Active Directory Authentication with ESX Server' doesn't make any reference to this, or in fact any manual edit of configuration files. Perhaps an oversight on their part (as we aren't all Linux gurus).
I recreated the environment (AD infrastructure powered down) to try to reproduce the problem - but couldn't. Login with the root account proceeds normally, without making the recommended changes to system-auth.
Having run into a dead end, I logged an SR with VMware and am awaiting a response.
It is always best never to make 'root' part of your AD domain, but to add administrative users then if they do login use sudo instead of su or direct root access. root should be used only for critical issues as you had, so it should be made available outside any remote authentication service.
This is unfortunate. Most of the commercial AD bridge products have some sort of "root@localhost" mechanism, just like a Windows desktop has a local administrator. You might want to check the Samba mailing lists to see if they have anything equivalent. Maybe you should create a "toor" user who is only defined locally just for these situations.
mp
The problem is that anything with a UID < 500 will be sent first to AD, regardless of whether there is such a username in AD or not. If that fails it will bail the login process. Its the UID in Linux that is really the issue. You could still create a 'toor' or 'root@localhost' account but because of the setup esxcfg-auth does by default, it will still send or try to send to AD and fail. Remember, Linux uses UIDs more than usernames. You need to not send UIDs < 500 to AD some would even claim UIDs < 1000 to AD so you can have your local users.
It is the pam modules. If you are using pam_krb5.so then it is the case. It depends on how the pam modules are setup more than anything. NSS happens in pam_unix mainly.
Also remember that hostd uses pam modules and you must protect the 'vpxuser' from using AD or you may get a worse mess depending on your level of integration. All this depends on your level of integration.....