VMware Cloud Community
bisi
Contributor
Contributor

ESX-Host disconnected from Virtual Infrastructure(after vpxuser's password apparently changed)?

Hello,

one of our ESX-Hosts suddenly appeared disconnected from the virtual center. In the messages file, I see a message "Rejected password for user vpxuser", which is the user that is used for the communication between the virtual Center and the ESX-Host.

As I read, the password is generated when adding the host to the Virtual Center. So my first question is if it is normal that the password is changed afterwords? If so, is it a known issue that it can become dissyncronized between the Virtual Center and one host(all the others worked fine) and is there any patch that corrects the issue?

(The only possibility that I read was to set the "esxcfg-auth --maxfailedlogins"to 0, which seems to be very risky from a security point of view, no?)

As a result of the missmateched password, the Virtual Center seemed to have discared the ESX-host from the cluster since it was not able to "talk" to it anymore. Is it normal that the HA does not force the VMs to migrate to another host, since all the VMs stood on the host with the wrong vpxuser-password and I was not able to manage them from Virtual Center? At the same time, we saw that the machines on the "faulty" host had become very very slow so that users were not able to work with it anymore. The question is now if there is a logical explanation for this(the fact that the machines are getting slow) or could there be another root cause which could have caused the two issues?

My last concern is if there is a possibility to ensure that the VMs are automatically migrated to another ESX-host when such things happen?

Thank you very much in advance

PR

0 Kudos
9 Replies
weinstein5
Immortal
Immortal

The only I know for the vpxuser password to change is either by adding the esx host to another VC environemnt or someone manually changing the password fromt he esx host itself as root -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
Troy_Clavell
Immortal
Immortal

you can also just delete the vpxuser and then try to connect the HOST back into the cluster, it will create another vpxuser account in process of adding to the cluster.

userdel -r vpxuser

0 Kudos
kooltechies
Expert
Expert

Hi,

Why HA has not been triggered is because the HA agents are installed on each and every ESX hosts and they are not dependent on VC for operation/communication. You obviously need VC for initial configuration but after that HA agents are independent of VC. The question of why VMs are getting slow is something that needs digging at multiple locations.

Such as how much of free disk space you have on the VMFS partitions or in /var/log folder , second if the host is overcommitted in terms of resources or one of the VM is hogging the resources. You should also look for any storage related issues in the /var/log/vmkernel.

Thanks,

Samir

P.S : If you think this information is helpful please consider awarding points

Blog : http://thinkingloudoncloud.com || Twitter : @kooltechies || P.S : If you think that the answer is correct/helpful please consider rewarding points.
0 Kudos
bisi
Contributor
Contributor

Thank you for all your answers which already answered parts of my questions

Concerning the answer of weinstein:

- I was currently eating while the incident happened and my collegue which also knows the password is in holidays. Normally no one else should know the password. There are two other persons which have access to the Virtual Center, but I do not know how to change the password from there, or is there a possibility to change the vpxuser's password from the Virtual Infrastructure client. Normally someone would have to do an ssh on the server to change the password? Is this tracable from a log file?

- To Troy, thanx for pointing me to the command. If it happens again, I know what to do

- To kooltechies, thanks for the explanation about why HA did not move the machines, sounds logical to me.

Here are some parts of the different logfiles which seem to be relevant(sorry for posting that much). In the message file, I see "Feb 10 12:10:28 esx-esx1 passwd(pam_unix)[828]: password changed for vpxuser", but I can't explain how this happened. As I said, me collegue and I were not in the office and noone else knows the root-passwd and even if someone did a login on the server, I do not see why he should have changed the vpxusers password.....(only if it was intentionnaly). But perhaps someone sees something that went wrong and which resulted in changing the password...

Here the log file-extracts(I changed the name of the host that had problems to esx-esx1 in the logs for privacy reasons)

From the hostd-files:

Event 944 : User vpxuser@127.0.0.1 logged in

Task Created : haTask-ha-folder-root-vim.host.LocalAccountManager.updateUser-889718

FormatField: Optional unset (vim.event.VmUuidChangedEvent.vm)

FormatField: Optional unset (vim.event.VmUuidChangedEvent.host)

FormatField: Optional unset (vim.event.VmUuidChangedEvent.datacenter)

Current value 205420 exceeds hard limit 204800. Shutting down process.

END SERVICES

Block List Service Plugin stopping

Block List Service Plugin stopped

Plugin stopped

Plugin stopped

Plugin stopped

Plugin stopped

Stopped Proxy service

Stopped Proxy service

Proxy service stopped.

Plugin stopped

Shutdown: port udp/161 closed

Plugin stopped

Stopping stassvc plugin

Plugin stopped

VM Services Plugin stopping

VM Services Plugin stopped

Plugin stopped

Stopping vimsvc plugin

Stopping partitionsvc plugin

Task Completed : haTask-ha-folder-root-vim.host.LocalAccountManager.updateUser-889718

Resource checker stopped.

Proxy service stopped.

.....

Vmacore::InitSSL: doVersionCheck = false, handshakeTimeoutUs = 120000000

Initialized SSL context with version all

Host name: esx-esx1.win.vdl.lu

Compute resource instantiated

Environments file: /etc/vmware/hostd/environments.xml

Descriptor loaded: 0

Options loaded

Descriptor loaded: 1

Options loaded

HAL05LoadHALLibraries: Could not dlopen libhal.so.1.

HAL04LoadHALLibraries: Could not dlopen libhal.so.0.

....

VMServices Plugin initialized

Block List Service Plugin started

Plugin started

About:(vim.AboutInfo) {

dynamicType = <unset>,

name = "VMware ESX Server",

fullName = "VMware ESX Server build-64607",

vendor = "VMware, Inc.",

version = "3.5.0",

build = "64607",

localeVersion = <unset>,

localeBuild = <unset>,

osType = "vmnix-x86",

productLineId = "esx",

apiType = "HostAgent",

apiVersion = "2.5.0",

}

......

VM inventory configuration: /etc/vmware/hostd/vmInventory.xml

Max supported virtual machines: 1200

Reloading config state: /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx Mounting virtual machine paths on connection: /db/connection/#67/, /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx

Mount VM completion for vm: /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx

Mount VM Complete: /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx, Return code: OK

Connected to /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx:testAutomation-fd, remote end sent pid: 102385

DISKLIB-VMFS : "/vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA-flat.vmdk" : open successful (23) size = 9663676416, hd = 0. Type 3

DISKLIB-VMFS : "/vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA-flat.vmdk" : closed.

Could not find VM 2848. Not setting capabilities.

DISKLIB-VMFS : "/vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA-flat.vmdk" : open successful (17) size = 9663676416, hd = 0. Type 3

DISKLIB-VMFS : "/vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA-flat.vmdk" : closed.

State Transition

(VM_STATE_INITIALIZING -> VM_STATE_ON)

Initialized virtual machine.

Loaded virtual machine: /vmfs/volumes/48aac548-193d96f4-4892-001a4ba9a2da/SRV-NTLMA/SRV-NTLMA.vmx

Then it does the same for all VMs

.......

Check resources every 30 secs, soft limit 122880, hard limit 204800.

Created session

BEGIN SERVICES

Event 1 : Failed login attempt for vpxuser@127.0.0.1

Activation : Invoke done on

Throw vim.fault.InvalidLogin

Result:

(vim.fault.InvalidLogin) {

dynamicType = <unset>,

msg = ""

}

Event 2 : Failed login attempt for vpxuser@127.0.0.1

Activation : Invoke done on

Throw vim.fault.InvalidLogin

Result:

(vim.fault.InvalidLogin) {

dynamicType = <unset>,

msg = ""

}

Then I see about 550 Failed logins for vpxuser.... Smiley Sad

From the "message" file:

Feb 10 12:10:27 esx-esx1 vmware-hostd[1912]: Accepted password for user vpxuser from 127.0.0.1

Feb 10 12:10:28 esx-esx1 passwd(pam_unix)[828]: password changed for vpxuser

Feb 10 12:10:28 esx-esx1 vmware-authd(pam_unix)[1912]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=root

Feb 10 12:10:33 esx-esx1 watchdog-hostd: '/usr/sbin/vmware-hostd -u -a' exited after 12082141 seconds

Feb 10 12:10:33 esx-esx1 watchdog-hostd: Executing cleanup command '/usr/sbin/vmware-hostd-support'

Feb 10 12:10:33 esx-esx1 watchdog-hostd: Executing '/usr/sbin/vmware-hostd -u -a'

Feb 10 12:10:36 esx-esx1 watchdog-vpxa: '/opt/vmware/vpxa/sbin/vpxa' exited after 6818596 seconds

Feb 10 12:10:36 esx-esx1 watchdog-vpxa: Executing '/opt/vmware/vpxa/sbin/vpxa'

Feb 10 12:10:48 esx-esx1 modprobe: modprobe: Can't locate module char-major-14

Feb 10 12:10:48 esx-esx1 modprobe: modprobe: Can't locate module block-major-2

Feb 10 12:10:48 esx-esx1 last message repeated 6 times

Feb 10 12:10:49 esx-esx1 modprobe: modprobe: Can't locate module char-major-14

Feb 10 12:10:49 esx-esx1 modprobe: modprobe: Can't locate module block-major-2

Feb 10 12:10:49 esx-esx1 last message repeated 6 times

Feb 10 12:10:50 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=root

Feb 10 12:11:33 esx-esx1 modprobe: modprobe: Can't locate module block-major-2

Feb 10 12:11:33 esx-esx1 last message repeated 6 times

Feb 10 12:11:33 esx-esx1 modprobe: modprobe: Can't locate module char-major-14

Feb 10 12:12:04 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser

Feb 10 12:12:07 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1

Feb 10 12:12:14 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser

Feb 10 12:12:17 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1

Feb 10 12:12:24 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser

Feb 10 12:12:27 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1

Feb 10 12:12:34 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser

Feb 10 12:12:37 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1

Feb 10 12:12:44 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser

Feb 10 12:12:47 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1

Feb 10 12:12:54 esx-esx1 vmware-authd(pam_unix)[855]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=vpxuser

Feb 10 12:12:57 esx-esx1 vmware-hostd[855]: Rejected password for user vpxuser from 127.0.0.1

Then 850 failed logins for vpxuser

From the vpxa - file:

CMD: Tue Feb 10 12:09:29 2009 /opt/vmware/aam/bin/ft_gethostbyname esx-esx1 |grep FAILED

main::verify_network_configuration:69: cmd status was 0

CMD: Tue Feb 10 12:10:31 2009 /opt/vmware/aam/bin/ft_gethostbyname esx-esx1 |grep FAILED

RESULT:

-


esx-esx6 Primary Agent Running

esx-esx5 Primary Agent Running

esx-esx4 Primary Agent Running

esx-esx3 Secondary Agent Running

esx-esx2 Primary Agent Running

esx-esx1 Primary Agent Running

VMwareresult=success

Total time for script to complete: 0 minute(s) and 1 second(s)

Command returned successfully

Failed to send request. Retrying. Error: N7Vmacore15SystemExceptionE(Broken pipe)

Failed to send request. Retrying. Error: N7Vmacore15SystemExceptionE(Connection reset by peer)

Failed to send request. Retrying. Error: N7Vmacore4Http24MalformedHeaderExceptionE

(Incomplete header received)

Received callback in WaitForUpdatesDone

Worker: Unhandled exception:

No such file or directory

Failed to get resource pool summary: No such file or directory

-> eip 0x8f601b7

-> eip 0x8f627fe

-> eip 0x8f623b3

-> eip 0x9053c0a

-> eip 0x9093635

-> eip 0x9092248

-> eip 0x9091da8

-> eip 0x8fdd880

-> eip 0x8fddca6

-> eip 0x8fdeefd

-> eip 0x8fde826

-> eip 0x8fe4d9d

-> eip 0x9093e00

-> eip 0x909665d

-> eip 0x9015d3d

-> eip 0x8f620bd

-> eip 0x8f65e54

-> eip 0x8f71517

-> eip 0x8f6bd01

-> eip 0x8f6b852

-> eip 0x90458eb

-> eip 0x892ae27

-> eip 0x89295e6

-> eip 0x12679a

-> eip 0x8929081

Received unexpected error from property collector: No such file or

directory

eip 0x9065e52

eip 0x900d2a4

eip 0x9047995

eip 0x8f601b7

eip 0x8f627fe

eip 0x8f623b3

eip 0x9053c0a

eip 0x9093635

eip 0x9092248

eip 0x9091da8

eip 0x8fdd880

eip 0x8fddca6

eip 0x8fdeefd

eip 0x8fde826

eip 0x8fe4d9d

eip 0x9093e00

eip 0x909665d

eip 0x9015d3d

eip 0x8f620bd

eip 0x8f65e54

eip 0x8f71517

eip 0x8f6bd01

eip 0x8f6b852

eip 0x90458eb

eip 0x892ae27

eip 0x89295e6

eip 0x12679a

eip 0x8929081

Backtrace:

eip 0x9065e52

eip 0x900d2a4

eip 0x9047995

eip 0x8f601b7

eip 0x8f627fe

eip 0x8f623b3

eip 0x9053c0a

eip 0x9093635

eip 0x9092248

eip 0x9091da8

eip 0x8fdd880

eip 0x8fddca6

eip 0x8fde5fb

eip 0x8fe4af6

eip 0x9093c1c

eip 0x9096796

eip 0x9015f91

eip 0x8f6215c

eip 0x8f65e54

eip 0x8f71517

eip 0x8f6bd01

eip 0x8f75218

eip 0x8f762ec

eip 0xf21dd8

eip 0x1edfca

Failed to destroy filter »|

on unregistering listener.

Failed to destroy filter »|

P'ÿ¿˜‘ÿ¿È « on unregistering listener.

Can't connect to hostd/serverd. Shutting down...

Shutting down now

....

Initializing SSL

Using system libcrypto, version 90701F

DLSYM: Failed to resolve FIPS_mode_set: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:

FIPS_mode_set

DLSYM: Failed to resolve FIPS_mode: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:

FIPS_mode

DLSYM: Failed to resolve SHA256: /opt/vmware/vpxa/vpx/vpxa: undefined symbol: SHA256

DLSYM: Failed to resolve SHA512: /opt/vmware/vpxa/vpx/vpxa: undefined symbol: SHA512

DLSYM: Failed to resolve EVP_sha224: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:

EVP_sha224

DLSYM: Failed to resolve EVP_sha256: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:

EVP_sha256

DLSYM: Failed to resolve EVP_sha384: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:

EVP_sha384

DLSYM: Failed to resolve EVP_sha512: /opt/vmware/vpxa/vpx/vpxa: undefined symbol:

EVP_sha512

Vmacore::InitSSL: doVersionCheck = false, handshakeTimeoutUs = 120000000

Initializing SSL Contexts

Removing stale symlink /var/run/vmware/vmware%2dvpxa

Removing stale cnx files in /var/run/vmware/root/27832

Starting VMware VirtualCenter Agent Agent Daemon 2.5.0 build-84767

Init: Succeeded with directory = /var/log/vmware/journal

Output:

VMware ESX Server 3.5.0 build-64607

Security policy allocated

32 max LROs

0 reserved internal LROs

0 reserved blocker LROs

6 reserved short LROs

2 reserved long LROs

600-second task lifetime

minimum pool size is 11

maximum pool size is 37

Manager IP: 10.150.4.120:902 Host IP: 10.150.4.33

Increment master gen. no to (1): HostConfig:VpxaInvtHost::Init

Increment master gen. no to (2): ResourcePool:VpxaInvtHost::Init

Increment master gen. no (3): Init

Last stats polling used ms

Creating temporary connect spec: localhost:443

Failed to discover namespace: Connection refused

Could not resolve namespace for authenticating to host agent

Session timeout is 1440 minutes

Output:

VMware ESX Server 3.5.0 build-64607

Output:

VMware ESX Server 3.5.0 build-64607

Starting SOAP adapter on named pipe /var/run/vmware/proxy-vpxa

Using new VMDB VMOMI serialization format

Found previous domain socket /var/run/vmware/proxy-vpxa. Removing...

Document root directory not found at /etc/vmware/docRoot

Http Service started: Server UNIX(/var/run/vmware/proxy-vpxa)

SOAP adapter started on port -1

Check resources every 30 secs, soft limit 204800, hard limit 256000.

Setting system limit of 1024

Set system limit to 1024

Last stats polling used ms

Creating temporary connect spec: localhost:443

Last stats polling used ms

Error fetching /definitions/import/@namespace from /sdk/vimService?wsdl: 404

(Not Found)

Could not resolve namespace for authenticating to host agent

Last stats polling used ms

Creating temporary connect spec: localhost:443

Local and Remote Namespace are the same. Talking with namespace vim25

Connecting to hostd with vim namespace vim25

Connect to host completed

Last stats polling used ms

heartbeating 10.150.4.120 ...Count 1

Invoking on session

No auth data found for privileged operation

Invoking on session

-- BEGIN task-internal-1 -- -- vpxapi.VpxaService.getVpxaInfo -- 520405ca-8480-119d-01d7-8395e6c95cca

Invoke done: vpxapi.VpxaService.getVpxaInfo session: 520405ca-8480-119d-01d7-8395e6c95cca

-- FINISH task-internal-1 -- -- vpxapi.VpxaService.getVpxaInfo -- 520405ca-8480-119d-01d7-8395e6c95cca

Invoking on session [520405ca-8480-119d-01d7-

8395e6c95cca]

-- BEGIN task-internal-2 -- -- vpxapi.VpxaService.login -- 520405ca-8480-119d-01d7-8395e6c95cca

-- FINISH task-internal-2 -- -- vpxapi.VpxaService.login -- 520405ca-8480-119d-01d7-8395e6c95cca

-- ERROR task-internal-2 -- -- vpxapi.VpxaService.login: vim.fault.InvalidLogin:

(vim.fault.InvalidLogin) {

dynamicType = <unset>,

msg = "Login failed due to a bad username or password."

}

heartbeating 10.150.4.120 ...Count 2

Invoking on session

No auth data found for privileged operation

Invoking on session

-- BEGIN task-internal-3 -- -- vpxapi.VpxaService.getVpxaInfo -- 520405ca-8480-119d-01d7-8395e6c95cca

Invoke done: vpxapi.VpxaService.getVpxaInfo session: 520405ca-8480-119d-01d7-8395e6c95cca

-- FINISH task-internal-3 -- -- vpxapi.VpxaService.getVpxaInfo -- 520405ca-8480-119d-01d7-8395e6c95cca

Invoking on session

-- BEGIN task-internal-4 -- -- vpxapi.VpxaService.login -- 520405ca-8480-119d-01d7-8395e6c95cca

-- FINISH task-internal-4 -- -- vpxapi.VpxaService.login -- 520405ca-8480-119d-01d7-8395e6c95cca

-- ERROR task-internal-4 -- -- vpxapi.VpxaService.login: vim.fault.InvalidLogin:

(vim.fault.InvalidLogin) {

dynamicType = <unset>,

msg = "Login failed due to a bad username or password."

}

[

I see this message about 500 times here

0 Kudos
ToniK
Contributor
Contributor

I had a similar problem.

In my case I came in one morning to find that the host was disconnected - i rebooted vcentre server to which it then connected - but all vms on host were disconnected - the vpxuser password had changed the previous evening when no one was around and as i am the only admin here - it certainly wasnt me.

Here is the solution that I did to resolve the problem - still do not know why the password changed though. http://communities.vmware.com/thread/188621

0 Kudos
bisi
Contributor
Contributor

Hello Toni,

thanks for your answer. I am "glad" to hear that I am not the only one having this problem. At least, I can be pretty sure not being attacked or that no one manually changed the password. It seems to be an issue in VMWare, any confirmation for this possible?

Is there any patch or workaround for this problem ?

Best regards

Claude

0 Kudos
jaygriffin
Enthusiast
Enthusiast

Add me to the list of one who has also experienced a MYSTERIOUS password change. Searching through the forums -- I have to believe something is going on with VC to cause this. There are a number of people who have had their vpxuser password change without any explanation why.

I came in today and all the VM's on a particular host were disconnected. I was seeing vpxuser password problems.

0 Kudos
3r1k
Contributor
Contributor

Same here. I have some additional information:

One host showed up as disconnected, all its guests were disconnected. Logfile reported a password change for user vpxuser, initiated from 127.0.0.1 followed by authentication errors. The same errors appeared in the logs on 2 other hosts, occuring at the same time as the first host, however these 2 were not disconnected, and their guests appeard as normal.

I wonder - Why does something change password without me knowing it? And is this sudden password change the reason that one ESX host got disconnected?

0 Kudos
olano
Contributor
Contributor

Hi Bisi,

   I meet the same problem as you. But my host is ESX4.0 hosts.My ESX hosts frequently disconnect from the vcenter.

   Have you found any root cause? Can you share your solution?

   Thanks!

Lan

0 Kudos