After a power outage at our DR site I wasn't able to login to the ESX console (via iLO) or SSH as root. The login process accepted the user name and prompted for the password, after which it hung for a while and bounced back to the login prompt.
We use kerberos for authentication with AD and because the hosts had powered up before the SAN, our disks weren't available at this stage - so all VMs were powered off (including DNS and domain controllers).
Now I fully expected not to be able to login with my local account, but never expected the root account to be locked out, otherwise I wouldn't have implemented kerneros authentication in the first place... and the VMware manual clearly states that root access will always be available.
I was actually able to change the root pasword via single user mode, which got me in until a reboot, when I was locked out again. I've checked AD and there is no root account. I did find an old (2006) reference in a Red Hat document to a bug that caused this same behaviour, but it was marked as resolved.
Has anyone had this issue??
Hello,
Moved to the Security forum.
DCs and DNS were down at the time - but I expected to be able to login as root, as per this excerpt from ‘Enabling Active Directory Authentication with ESX Server':
Q. What if my ESX Server computer cannot contact a domain controller?
A. By preserving root access to the local service console, authorized personnel can always log on as root, even if contact with the domain is lost.
You do this by making sure AD or Kerberos is NOT affecting any account with a uid < 500. It sounds like you have it affecting all accounts which will lead to the chicken and the egg lockout you are seeing.
In /etc/pam.d/system-auth you would use a line similar to
auth sufficient /lib/security/$ISA/pam_krb5.so use_first_pass minimun_uid=500
Best regards,
Edward L. Haletky
VMware Communities User Moderator, VMware vExpert 2009, DABCC Analyst
====
Now Available on Rough-Cuts: 'VMware vSphere(TM) and Virtual Infrastructure Security: Securing ESX and the Virtual Environment'
Also available 'VMWare ESX Server in the Enterprise'
SearchVMware Pro|Blue Gears|Top Virtualization Security Links|Virtualization Security Round Table Podcast
I have heard of lockdown mode for ESXi, but not for ESX. If it was ESXi, that is what I would point to but.....
The only other thing I can think of is that is it possible you didn't type in the root password correctly? And after you reset the root password you were able to successfully log in
you may want to check /var/log/messages
>We use kerberos for authentication with AD
I think that the lockdown processes is inside AD (policy dependent), not ESX.
Also check the status of you root user (if you have a second user or sudo credential) with
chage -l root
Andrea
**if you found this or any other answer useful please consider allocating points for helpful or correct answers
I was definitely typing the correct root password... even temporarily reset it to 'password'.
Hadn't heard of lockdown mode before, but on reading up it still allows console access for root:
http://communities.vmware.com/docs/DOC-7833
I should point out that I can now again login with root, but don't know what resolved the issue (and this affected 4 ESX hosts, not just one). I am still thinking a bug in the kerberos implementation, as I have always thought as Edward noted below in another discussion I came across...
Hello,
The root user is NEVER locked out. Without access to the root user you can not manage the system.
Best regards,
Edward L. Haletky
VMware Communities User Moderator, VMware vExpert 2009
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
PS chage -l shows the root account never expires and password never expires
PPS there are quite a few entries in /var/log/messages of 'account not found in kerberos database', but I'm not sure if that refers to a local database or AD... and if AD, whether it really couldn't find the account or just couldn't talk to AD.
Verify time synchronisation for your Hosts
If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points
Tom Howarth VCP / vExpert
VMware Communities User Moderator
Blog: www.planetvm.net
Contributing author for the upcoming book "[VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment|http://my.safaribooksonline.com/9780136083214]”. Currently available on roughcuts
>Verify time synchronisation for your Hosts
Will do tomorrow when back in the office. Are you suggesting ESX still tries to talk to AD when root logs in... and if host time is out of synch with AD the process fails?
It is a possibility, Kerberos is very sensitive to time diferences.
If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points
Tom Howarth VCP / vExpert
VMware Communities User Moderator
Blog: www.planetvm.net
Contributing author for the upcoming book "[VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment|http://my.safaribooksonline.com/9780136083214]". Currently available on roughcuts
Weird issue. If you can at some point you might want to evacuate all the VMs off one of the hosts and isolate it from the AD network and reboot.
Does the same issue occur? If yes there is something up with your kerb setup. If the issue does not reoccur it may have been a one time confluence of circumstances.
It's starting to look like a problem with kerberos versus login process timeouts... as described below:
Note that this module assumes the network is available in order to do a Kerberos authentication, and if the network is not available, some Kerberos libraries have timeouts longer than the timeout imposed by the login process. This means that using this module incautiously can make it impossible to log on to console as root. For this reason, you should always use the ignore_root or minimum_uid options, list a local authentication module such as pam_unix first with a control field of sufficient so that the Kerberos PAM module will be skipped if local password authentication was successful.
I've posted a question on LinuxQuestions.org asking for the login process when using kerberos authentication. I initially thought that login with root bypassed the kerberos process, but now I'm thinking otherwise.
I used the following command to configure:
/usr/sbin/esxcfg-auth --enablead --addomain=domain.com --addc=domain.com (replacing domain.com with my domain name).
Can someone tell me where I need to place the ignore_root parameter mentioned above (file and line)?
>Can someone tell me where I need to place the ignore_root parameter mentioned above (file and line)?
If you create a user in AD with account name root, you can logon as root
with its AD password.
If you don't want AD authentication for root, you can edit
/etc/pam.d/system-auth. On the line that starts with auth and also includes
pam_krb5.so, add this to the end: minimum_uid=1. Authentication for root (uid=0)
will now be done locally only.
If you want the AD user to have the same rights as root, you can set the
user's UID to 0 (usermod -u 0 -c username). Of course, if you have used
minimum_uid, that won't work.
Alternatively, use sudoers to allow users to use sudo to execute specific
tasks as root.
Andrea
**if you found this or any other answer useful please consider allocating points for helpful or correct answers
Thanks for the tip Andrea.
Even with your recommended change to /etc/pam.d/system-auth will the first line below still send a request to AD (and timeout when the network is down)?
account sufficient /lib/security/$ISA/pam_krb5.so
account required /lib/security/$ISA/pam_unix.so
Would swapping these lines around avoid that?
>Would swapping these lines around avoid that?
The order is corret, if you swap it the is required the Unix auth.
When you have used the esxcfg-auth, do you add also the --enablecache parameter?
Andrea
**if you found this or any other answer useful please consider allocating points for helpful or correct answers
>The order is corret, if you swap it the is required the Unix auth.
Not quite sure what you mean?
I don't use --enablecache... wouldn't it be a security risk to cache root logins?
>I don't use --enablecache... wouldn't it be a security risk to cache root logins?
Yes is a security risk. But just to check if the problem is with the AD comunications.
DNS and time are fine?
Andrea
**if you found this or any other answer useful please consider allocating points for helpful or correct answers
DCs and DNS were down at the time - but I expected to be able to login as root, as per this excerpt from ‘Enabling Active Directory Authentication with ESX Server':
Q. What if my ESX Server computer cannot contact a domain controller?
A. By preserving root access to the local service console, authorized personnel can always log on as root, even if contact with the domain is lost.
Hello,
Moved to the Security forum.
DCs and DNS were down at the time - but I expected to be able to login as root, as per this excerpt from ‘Enabling Active Directory Authentication with ESX Server':
Q. What if my ESX Server computer cannot contact a domain controller?
A. By preserving root access to the local service console, authorized personnel can always log on as root, even if contact with the domain is lost.
You do this by making sure AD or Kerberos is NOT affecting any account with a uid < 500. It sounds like you have it affecting all accounts which will lead to the chicken and the egg lockout you are seeing.
In /etc/pam.d/system-auth you would use a line similar to
auth sufficient /lib/security/$ISA/pam_krb5.so use_first_pass minimun_uid=500
Best regards,
Edward L. Haletky
VMware Communities User Moderator, VMware vExpert 2009, DABCC Analyst
====
Now Available on Rough-Cuts: 'VMware vSphere(TM) and Virtual Infrastructure Security: Securing ESX and the Virtual Environment'
Also available 'VMWare ESX Server in the Enterprise'
SearchVMware Pro|Blue Gears|Top Virtualization Security Links|Virtualization Security Round Table Podcast
Thanks Edward, that is similar to what Andrea recommended. I will try that.
The VMware document 'Enabling Active Directory Authentication with ESX Server' doesn't make any reference to this, or in fact any manual edit of configuration files. Perhaps an oversight on their part (as we aren't all Linux gurus).
I recreated the environment (AD infrastructure powered down) to try to reproduce the problem - but couldn't. Login with the root account proceeds normally, without making the recommended changes to system-auth.
Having run into a dead end, I logged an SR with VMware and am awaiting a response.
Thanks for your help.
Hello,
I documented it in http://www.astroarch.com/wiki/index.php/Full_Integration_of_Active_Directory which is quite a bit different than what VMware puts out. You really need that option. if AD is not running or you have 'lost' the cached credentials due to time then you will lock out the root account as it will look to go to pam_krb5.so for everything. It is how the pam modules work. Check out http://www.astroarch.com/wiki/index.php/Remote_Authentication for other techniques as well.
It is always best never to make 'root' part of your AD domain, but to add administrative users then if they do login use sudo instead of su or direct root access. root should be used only for critical issues as you had, so it should be made available outside any remote authentication service.
Best regards,
Edward L. Haletky
VMware Communities User Moderator, VMware vExpert 2009, DABCC Analyst
====
Now Available on Rough-Cuts: 'VMware vSphere(TM) and Virtual Infrastructure Security: Securing ESX and the Virtual Environment'
Also available 'VMWare ESX Server in the Enterprise'
SearchVMware Pro|Blue Gears|Top Virtualization Security Links|Virtualization Security Round Table Podcast
This is unfortunate. Most of the commercial AD bridge products have some sort of "root@localhost" mechanism, just like a Windows desktop has a local administrator. You might want to check the Samba mailing lists to see if they have anything equivalent. Maybe you should create a "toor" user who is only defined locally just for these situations.
mp