Hello,
one of our ESX-Hosts suddenly appeared disconnected from the virtual center. In the messages file, I see a message "Rejected password for user vpxuser", which is the user that is used for the communication between the virtual Center and the ESX-Host.
As I read, the password is generated when adding the host to the Virtual Center. So my first question is if it is normal that the password is changed afterwords? If so, is it a known issue that it can become dissyncronized between the Virtual Center and one host(all the others worked fine) and is there any patch that corrects the issue?
(The only possibility that I read was to set the "esxcfg-auth --maxfailedlogins"to 0, which seems to be very risky from a security point of view, no?)
As a result of the missmateched password, the Virtual Center seemed to have discared the ESX-host from the cluster since it was not able to "talk" to it anymore. Is it normal that the HA does not force the VMs to migrate to another host, since all the VMs stood on the host with the wrong vpxuser-password and I was not able to manage them from Virtual Center? At the same time, we saw that the machines on the "faulty" host had become very very slow so that users were not able to work with it anymore. The question is now if there is a logical explanation for this(the fact that the machines are getting slow) or could there be another root cause which could have caused the two issues?
My last concern is if there is a possibility to ensure that the VMs are automatically migrated to another ESX-host when such things happen?
Thank you very much in advance
PR
The only I know for the vpxuser password to change is either by adding the esx host to another VC environemnt or someone manually changing the password fromt he esx host itself as root -
If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
you can also just delete the vpxuser and then try to connect the HOST back into the cluster, it will create another vpxuser account in process of adding to the cluster.
userdel -r vpxuser
Hi,
Why HA has not been triggered is because the HA agents are installed on each and every ESX hosts and they are not dependent on VC for operation/communication. You obviously need VC for initial configuration but after that HA agents are independent of VC. The question of why VMs are getting slow is something that needs digging at multiple locations.
Such as how much of free disk space you have on the VMFS partitions or in /var/log folder , second if the host is overcommitted in terms of resources or one of the VM is hogging the resources. You should also look for any storage related issues in the /var/log/vmkernel.
Thanks,
Samir
P.S : If you think this information is helpful please consider awarding points
Thank you for all your answers which already answered parts of my questions
Concerning the answer of weinstein:
- I was currently eating while the incident happened and my collegue which also knows the password is in holidays. Normally no one else should know the password. There are two other persons which have access to the Virtual Center, but I do not know how to change the password from there, or is there a possibility to change the vpxuser's password from the Virtual Infrastructure client. Normally someone would have to do an ssh on the server to change the password? Is this tracable from a log file?
- To Troy, thanx for pointing me to the command. If it happens again, I know what to do
- To kooltechies, thanks for the explanation about why HA did not move the machines, sounds logical to me.
Here are some parts of the different logfiles which seem to be relevant(sorry for posting that much). In the message file, I see "Feb 10 12:10:28 esx-esx1 passwd(pam_unix)[828]: password changed for vpxuser", but I can't explain how this happened. As I said, me collegue and I were not in the office and noone else knows the root-passwd and even if someone did a login on the server, I do not see why he should have changed the vpxusers password.....(only if it was intentionnaly). But perhaps someone sees something that went wrong and which resulted in changing the password...
Here the log file-extracts(I changed the name of the host that had problems to esx-esx1 in the logs for privacy reasons)
From the hostd-files:
Event 944 : User vpxuser@127.0.0.1 logged in
Task Created : haTask-ha-folder-root-vim.host.LocalAccountManager.updateUser-889718
FormatField: Optional unset (vim.event.VmUuidChangedEvent.vm)
FormatField: Optional unset (vim.event.VmUuidChangedEvent.host)
FormatField: Optional unset (vim.event.VmUuidChangedEvent.datacenter)
Current value 205420 exceeds hard limit 204800. Shutting down process.
Block List Service Plugin stopping
Block List Service Plugin stopped
Task Completed : haTask-ha-folder-root-vim.host.LocalAccountManager.updateUser-889718
.....
Vmacore::InitSSL: doVersionCheck = false, handshakeTimeoutUs = 120000000
Initialized SSL context with version all
Host name: esx-esx1.win.vdl.lu
Environments file: /etc/vmware/hostd/environments.xml
HAL05LoadHALLibraries: Could not dlopen libhal.so.1.
HAL04LoadHALLibraries: Could not dlopen libhal.so.0.
....
Block List Service Plugin started
dynamicType = <unset>,
name = "VMware ESX Server",
fullName = "VMware ESX Server build-64607",
vendor = "VMware, Inc.",
version = "3.5.0",
build = "64607",
localeVersion = <unset>,
localeBuild = <unset>,
osType = "vmnix-x86",
productLineId = "esx",
apiType = "HostAgent",
apiVersion = "2.5.0",
}
......
VM inventory configuration: /etc/vmware/hostd/vmInventory.xml
Max supported virtual machines: 1200
Reloading config state: /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx Mounting virtual machine paths on connection: /db/connection/#67/, /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx
Mount VM completion for vm: /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx
Mount VM Complete: /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx, Return code: OK
Connected to /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx:testAutomation-fd, remote end sent pid: 102385
DISKLIB-VMFS : "/vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA-flat.vmdk" : open successful (23) size = 9663676416, hd = 0. Type 3
DISKLIB-VMFS : "/vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA-flat.vmdk" : closed.
Could not find VM 2848. Not setting capabilities.
DISKLIB-VMFS : "/vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA-flat.vmdk" : open successful (17) size = 9663676416, hd = 0. Type 3
DISKLIB-VMFS : "/vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA-flat.vmdk" : closed.
(VM_STATE_INITIALIZING -> VM_STATE_ON)
Loaded virtual machine: /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx
Then it does the same for all VMs
.......
Check resources every 30 secs, soft limit 122880, hard limit 204800.
Event 1 : Failed login attempt for vpxuser@127.0.0.1
(vim.fault.InvalidLogin) {
dynamicType = <unset>,
msg = ""
}
Event 2 : Failed login attempt for vpxuser@127.0.0.1
(vim.fault.InvalidLogin) {
dynamicType = <unset>,
msg = ""
}
Then I see about 550 Failed logins for vpxuser....
From the "message" file:
Feb 10 12:10:27 esx-esx1 vmware-hostd[1912]: Accepted password for user vpxuser from 127.0.0.1
Feb 10 12:10:28 esx-esx1 passwd(pam_unix)[828]: password changed for vpxuser
Feb 10 12:10:28 esx-esx1 vmware-authd(pam_unix)[1912]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=root
Feb 10 12:10:33 esx-esx1 watchdog-hostd: '/usr/sbin/vmware-hostd -u -a' exited after 12082141 seconds
Feb 10 12:10:33 esx-esx1 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'
Feb 10 12:10:33 esx-esx1 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u -a'
Feb 10 12:10:36 esx-esx1 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 6818596 seconds
Feb 10 12:10:36 esx-esx1 watchdog-vpxa: Executing '/opt/vmware/vpxa/sbin/vpxa'
Feb 10 12:10:48 esx-esx1 modprobe: modprobe: Can't locate module char-major-14
Feb 10 12:10:48 esx-esx1 modprobe: modprobe: Can't locate module block-major-2
Feb 10 12:10:48 esx-esx1 last message repeated 6 times
Feb 10 12:10:49 esx-esx1 modprobe: modprobe: Can't locate module char-major-14
Feb 10 12:10:49 esx-esx1 modprobe: modprobe: Can't locate module block-major-2
Feb 10 12:10:49 esx-esx1 last message repeated 6 times
Feb 10 12:10:50 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=root
Feb 10 12:11:33 esx-esx1 modprobe: modprobe: Can't locate module block-major-2
Feb 10 12:11:33 esx-esx1 last message repeated 6 times
Feb 10 12:11:33 esx-esx1 modprobe: modprobe: Can't locate module char-major-14
Feb 10 12:12:04 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser
Feb 10 12:12:07 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1
Feb 10 12:12:14 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser
Feb 10 12:12:17 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1
Feb 10 12:12:24 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser
Feb 10 12:12:27 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1
Feb 10 12:12:34 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser
Feb 10 12:12:37 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1
Feb 10 12:12:44 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser
Feb 10 12:12:47 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1
Feb 10 12:12:54 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser
Feb 10 12:12:57 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1
Then 850 failed logins for vpxuser
From the vpxa - file:
CMD: Tue Feb 10 12:09:29 2009 /opt/vmware/aam/bin/ft_gethostbyname esx-esx1 |grep FAILED
main::verify_network_configuration:69: cmd status was 0
CMD: Tue Feb 10 12:10:31 2009 /opt/vmware/aam/bin/ft_gethostbyname esx-esx1 |grep FAILED
RESULT:
-
esx-esx6 Primary Agent Running
esx-esx5 Primary Agent Running
esx-esx4 Primary Agent Running
esx-esx3 Secondary Agent Running
esx-esx2 Primary Agent Running
esx-esx1 Primary Agent Running
VMwareresult=success
Total time for script to complete: 0 minute(s) and 1 second(s)
Failed to send request. Retrying. Error: N7Vmacore15SystemExceptionE(Broken pipe)
Failed to send request. Retrying. Error: N7Vmacore15SystemExceptionE(Connection reset by peer)
Failed to send request. Retrying. Error: N7Vmacore4Http24MalformedHeaderExceptionE
(Incomplete header received)
Received callback in WaitForUpdatesDone
No such file or directory
Failed to get resource pool summary: No such file or directory
Received unexpected error from property collector: No such file or
directory
on unregistering listener.
P'ÿ¿˜‘ÿ¿È « on unregistering listener.
Can't connect to hostd/serverd. Shutting down...
....
Using system libcrypto, version 90701F
DLSYM: Failed to resolve FIPS_mode_set: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:
FIPS_mode_set
DLSYM: Failed to resolve FIPS_mode: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:
FIPS_mode
DLSYM: Failed to resolve SHA256: /opt/vmware/vpxa/vpx/vpxa: undefined symbol: SHA256
DLSYM: Failed to resolve SHA512: /opt/vmware/vpxa/vpx/vpxa: undefined symbol: SHA512
DLSYM: Failed to resolve EVP_sha224: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:
EVP_sha224
DLSYM: Failed to resolve EVP_sha256: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:
EVP_sha256
DLSYM: Failed to resolve EVP_sha384: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:
EVP_sha384
DLSYM: Failed to resolve EVP_sha512: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:
EVP_sha512
Vmacore::InitSSL: doVersionCheck = false, handshakeTimeoutUs = 120000000
Removing stale symlink /var/run/vmware/vmware%2dvpxa
Removing stale cnx files in /var/run/vmware/root/27832
Starting VMware VirtualCenter Agent Agent Daemon 2.5.0 build-84767
Init: Succeeded with directory = /var/log/vmware/journal
VMware ESX Server 3.5.0 build-64607
Manager IP: 10.150.4.120:902 Host IP: 10.150.4.33
Increment master gen. no to (1): HostConfig:VpxaInvtHost::Init
Increment master gen. no to (2): ResourcePool:VpxaInvtHost::Init
Increment master gen. no (3): Init
Creating temporary connect spec: localhost:443
Failed to discover namespace: Connection refused
Could not resolve namespace for authenticating to host agent
Session timeout is 1440 minutes
VMware ESX Server 3.5.0 build-64607
VMware ESX Server 3.5.0 build-64607
Starting SOAP adapter on named pipe /var/run/vmware/proxy-vpxa
Using new VMDB VMOMI serialization format
Found previous domain socket /var/run/vmware/proxy-vpxa. Removing...
Document root directory not found at /etc/vmware/docRoot
Http Service started: Server UNIX(/var/run/vmware/proxy-vpxa)
SOAP adapter started on port -1
Check resources every 30 secs, soft limit 204800, hard limit 256000.
Creating temporary connect spec: localhost:443
Error fetching /definitions/import/@namespace from /sdk/vimService?wsdl: 404
(Not Found)
Could not resolve namespace for authenticating to host agent
Creating temporary connect spec: localhost:443
Local and Remote Namespace are the same. Talking with namespace vim25
Connecting to hostd with vim namespace vim25
heartbeating 10.150.4.120 ...Count 1
No auth data found for privileged operation
-- BEGIN task-internal-1 -- -- vpxapi.VpxaService.getVpxaInfo -- 520405ca-8480-119d-01d7-8395e6c95cca
Invoke done: vpxapi.VpxaService.getVpxaInfo session: 520405ca-8480-119d-01d7-8395e6c95cca
-- FINISH task-internal-1 -- -- vpxapi.VpxaService.getVpxaInfo -- 520405ca-8480-119d-01d7-8395e6c95cca
Invoking on session [520405ca-8480-119d-01d7-
8395e6c95cca]
-- BEGIN task-internal-2 -- -- vpxapi.VpxaService.login -- 520405ca-8480-119d-01d7-8395e6c95cca
-- FINISH task-internal-2 -- -- vpxapi.VpxaService.login -- 520405ca-8480-119d-01d7-8395e6c95cca
-- ERROR task-internal-2 -- -- vpxapi.VpxaService.login: vim.fault.InvalidLogin:
(vim.fault.InvalidLogin) {
dynamicType = <unset>,
msg = "Login failed due to a bad username or password."
}
heartbeating 10.150.4.120 ...Count 2
No auth data found for privileged operation
-- BEGIN task-internal-3 -- -- vpxapi.VpxaService.getVpxaInfo -- 520405ca-8480-119d-01d7-8395e6c95cca
Invoke done: vpxapi.VpxaService.getVpxaInfo session: 520405ca-8480-119d-01d7-8395e6c95cca
-- FINISH task-internal-3 -- -- vpxapi.VpxaService.getVpxaInfo -- 520405ca-8480-119d-01d7-8395e6c95cca
-- BEGIN task-internal-4 -- -- vpxapi.VpxaService.login -- 520405ca-8480-119d-01d7-8395e6c95cca
-- FINISH task-internal-4 -- -- vpxapi.VpxaService.login -- 520405ca-8480-119d-01d7-8395e6c95cca
-- ERROR task-internal-4 -- -- vpxapi.VpxaService.login: vim.fault.InvalidLogin:
(vim.fault.InvalidLogin) {
dynamicType = <unset>,
msg = "Login failed due to a bad username or password."
}
[
I see this message about 500 times here
I had a similar problem.
In my case I came in one morning to find that the host was disconnected - i rebooted vcentre server to which it then connected - but all vms on host were disconnected - the vpxuser password had changed the previous evening when no one was around and as i am the only admin here - it certainly wasnt me.
Here is the solution that I did to resolve the problem - still do not know why the password changed though. http://communities.vmware.com/thread/188621
Hello Toni,
thanks for your answer. I am "glad" to hear that I am not the only one having this problem. At least, I can be pretty sure not being attacked or that no one manually changed the password. It seems to be an issue in VMWare, any confirmation for this possible?
Is there any patch or workaround for this problem ?
Best regards
Claude
Add me to the list of one who has also experienced a MYSTERIOUS password change. Searching through the forums -- I have to believe something is going on with VC to cause this. There are a number of people who have had their vpxuser password change without any explanation why.
I came in today and all the VM's on a particular host were disconnected. I was seeing vpxuser password problems.
Same here. I have some additional information:
One host showed up as disconnected, all its guests were disconnected. Logfile reported a password change for user vpxuser, initiated from 127.0.0.1 followed by authentication errors. The same errors appeared in the logs on 2 other hosts, occuring at the same time as the first host, however these 2 were not disconnected, and their guests appeard as normal.
I wonder - Why does something change password without me knowing it? And is this sudden password change the reason that one ESX host got disconnected?
Hi Bisi,
I meet the same problem as you. But my host is ESX4.0 hosts.My ESX hosts frequently disconnect from the vcenter.
Have you found any root cause? Can you share your solution?
Thanks!
Lan