VMware Cloud Community
ldentella
Contributor
Contributor

ESXi disconnecting from vCenter Server

Hello!

I've a small vSphere infrastructure with only 2 ESXi (4.1.0 260247) and a Virtual Center on a Win2008 VM.

One of the two ESXi shows the following problem:

- its status is not responding and the VM it hosts are disconnected

- if I right-click and click Connect, the reconnection process starts: the host is connected, HA is configured and everything seems working

- after 1-2 minutes the host disconnects again

I followed the guidelines in the KB:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100340...

Network connection seems ok (I can ping, telnet to port 902 from vCenter)

hostd and vpxa are running (tried to restart them with /sbin/services.sh restart)

the hosted VM work fine

If I directly connect my VIClient to the ESXi, I can read the following error in the task pane:

Task: Flush firmware configuration

Status: fault.TooManyWrites.summary

How can I solve?

Thanks!

7 Replies
idle-jam
Immortal
Immortal

is there any firewall in between the host and vcenter?

Reply
0 Kudos
ldentella
Contributor
Contributor

No, they are on the same network...

ESXi 1 --> 10.0.17.101

ESXi 2 --> 10.0.17.102

vCenter Server --> 10.0.17.110

ESXi 2 works fine, while ESXi 1 is disconnecting after a couple of minutes

Reply
0 Kudos
a_p_
Leadership
Leadership

I haven't seen this error before, so I can only guess.

Is the Hypervisor installed on an USB device? Maybe there are issues with the device!? Did you configure the scratch location?

If it is installed to disk, login to the tech support console and run df -h.

André

Reply
0 Kudos
ldentella
Contributor
Contributor

Hello!

ESXi is installed on local HD (2 in mirror), server is a Dell PowerEdge R710.

An iSCSI storage (Iomega StorCenter) is connected to both ESXi.

~ # df -h
Filesystem                Size      Used Available Use% Mounted on
visorfs                   1.5G    322.2M      1.2G  21% /
vmfs3                   999.8G    341.8G    657.9G  34% /vmfs/volumes/4cf9fd54-b34a2fa0-5af4-001b217dba68
vfat                      4.0G      2.9M      4.0G   0% /vmfs/volumes/4cf8cfa4-a5a837b5-6890-b8ac6f8eda8a
vmfs3                   999.8G    332.7G    667.1G  33% /vmfs/volumes/4cf9fd66-d6bc4910-ff95-001b217dba68
vmfs3                   227.3G     30.5G    196.7G  13% /vmfs/volumes/4cf8cfed-7e720e5a-3178-b8ac6f8eda8a
vfat                    249.7M    102.1M    147.6M  41% /vmfs/volumes/2fbaf874-39d71d01-0154-499e8be39a2a
vfat                    249.7M      4.0k    249.7M   0% /vmfs/volumes/734c1e74-7723f518-51b1-09c58a055e03
vfat                    285.9M    135.5M    150.4M  47% /vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a

Reply
0 Kudos
nkrishnan
Expert
Expert

hi,

Could you please check the vpxa.log and hostd.log in ESXi and vpxd.log or VC logs in windows 2008. we must be able to get some suspecious area that need to troubleshoot.

What is the last operation that system is triggering before the disconnect.

Thanks

Nithin

--Nithin
Reply
0 Kudos
ldentella
Contributor
Contributor

Hello!

(sorry for the delay but I was out of office...)

/var/log/vmware # cat hostd-195.log
Section for VMware ESX, pid=9327746, version=4.1.0, build=build-260247, option=Release
[2011-01-28 08:34:21.029 721F4B90 verbose 'Ticket 52 80 b2 2c 17 1c b7 23-a7 91 21 ff e1 82 20 48'] Ticket invalidated
[2011-01-28 08:34:25.656 721F4B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
[2011-01-28 08:34:27.198 71C81B90 verbose 'ResourcePool ha-root-pool'] Root pool capacity changed from 14782MHz/20323MB to 14782MHz/20320MB
[2011-01-28 08:34:40.024 72140B90 verbose 'Statssvc'] HostCtl exception Unable to complete Sysinfo operation.  Please see the VMkernel log file for more details.
[2011-01-28 08:35:00.026 72481B90 verbose 'Statssvc'] HostCtl exception Unable to complete Sysinfo operation.  Please see the VMkernel log file for more details.
[2011-01-28 08:35:20.024 71C81B90 verbose 'Statssvc'] HostCtl exception Unable to complete Sysinfo operation.  Please see the VMkernel log file for more details.
[2011-01-28 08:35:31.603 71C81B90 verbose 'ha-license-manager'] Load: Loading existing file: /etc/vmware/license.cfg
[2011-01-28 08:35:40.023 FFB3BE90 verbose 'Statssvc'] HostCtl exception Unable to complete Sysinfo operation.  Please see the VMkernel log file for more details.
[2011-01-28 08:35:55.939 72503B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
[2011-01-28 08:36:00.025 71A81B90 verbose 'Statssvc'] HostCtl exception Unable to complete Sysinfo operation.  Please see the VMkernel log file for more details.
/var/log/vmware/vpx # tail -n 500 vpxa-1500.log
[2011-01-28 08:34:25.238 16344B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
[2011-01-28 08:34:25.238 163C6B90 verbose 'App' opID=F869A256-00000085-1b-98] [VpxaVMAP::Invoke] Command returned successfully
[2011-01-28 08:34:25.238 16344B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2011-01-28 08:34:25.241 163C6B90 verbose 'App' opID=F869A256-00000085-1b-98] [VpxaVMAP] Waiting for initial policy 15 and resource 14 messages
[2011-01-28 08:34:25.787 16385B90 verbose 'App'] [VpxaVMAP::Notify] Message for Vpxa is <?xml version="1.0" encoding="UTF-8"?><Response><id>14</id><result>success</result></Response>
[2011-01-28 08:34:25.787 16385B90 verbose 'App'] [VpxaVMAP::Notify] Initial node resource message 14 is acknowledged
[2011-01-28 08:34:25.787 16281B90 verbose 'App'] [VpxaVMAP] WaitForResponse msgid 14 response success, state received
[2011-01-28 08:34:25.787 16385B90 verbose 'App'] [VpxaVMAP::Notify] Message for Vpxa is <?xml version="1.0" encoding="UTF-8"?><Response><id>15</id><result>success</result></Response>
[2011-01-28 08:34:25.788 16281B90 info 'App'] [VpxLRO] -- FINISH task-internal-135 --  -- ProcessAamMessage-NodeResourceRequest --
[2011-01-28 08:34:25.788 16385B90 verbose 'App'] [VpxaVMAP::Notify] Initial node policy message 15 is acknowledged
[2011-01-28 08:34:25.873 FFED7B10 verbose 'App'] [VpxaInvtHost] Increment master gen. no to (15856): Event:VpxaHalEvent::CheckQueuedEvents
[2011-01-28 08:34:25.953 16240B90 verbose 'VpxaHalCnxHostagent'] Received callback in WaitForUpdatesDone
[2011-01-28 08:34:25.953 16240B90 verbose 'VpxaHalCnxHostagent'] [VpxaHalCnxHostagent::ProcessUpdate] Applying updates from 19165 to 19166 (at 19165)
[2011-01-28 08:34:25.953 16240B90 verbose 'App'] [VpxaHalServices] HostChanged Event Fired, properties changed [runtime.healthSystemRuntime]
[2011-01-28 08:34:27.202 16303B90 verbose 'VpxaHalCnxHostagent'] Received callback in WaitForUpdatesDone
[2011-01-28 08:34:27.203 16303B90 verbose 'VpxaHalCnxHostagent'] [VpxaHalCnxHostagent::ProcessUpdate] Applying updates from 19166 to 19167 (at 19166)
[2011-01-28 08:34:27.203 16303B90 verbose 'VpxaHalResourcePool'] [VpxaHostdSpecSync] Received change notification from hostd ...
[2011-01-28 08:34:27.203 16303B90 verbose 'VpxaHalResourcePool'] [VpxaHostdSpecSync] [IsVmListChangeUpdate] returning false
[2011-01-28 08:34:27.203 16303B90 verbose 'VpxaHalResourcePool'] [VpxaHostdSpecSync] Setting _syncPending = true ...
[2011-01-28 08:34:27.203 16303B90 verbose 'VpxaHalResourcePool'] [VpxaHostdSpecSync] Launching ProcessResourceNotification in new thread...
[2011-01-28 08:34:27.203 16489B90 verbose 'VpxaHalResourcePool'] [VpxaHostdSpecSync] Operation is transform ...
[2011-01-28 08:34:27.211 16489B90 verbose 'VpxaHalResourcePool'] FetchChildResourceSpec() done
[2011-01-28 08:34:27.211 16489B90 verbose 'VpxaHalResourcePool'] [VpxaHostdSpecSync] Copied root values ...
[2011-01-28 08:34:27.212 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (2) = false
[2011-01-28 08:34:27.212 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (3) = false
[2011-01-28 08:34:27.212 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (4) = false
[2011-01-28 08:34:27.212 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (5) = false
[2011-01-28 08:34:27.213 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (6) = false
[2011-01-28 08:34:27.213 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (7) = false
[2011-01-28 08:34:27.213 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (8) = false
[2011-01-28 08:34:27.213 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (9) = false
[2011-01-28 08:34:27.214 16489B90 verbose 'App'] [VmTreeNode::Equal] ignoreOverheadLimit (10) = true
[2011-01-28 08:34:27.214 16489B90 verbose 'VpxaHalResourcePool'] [VpxaHostdSpecSync] Generated list of transforming operations  ...
[2011-01-28 08:34:27.214 16489B90 verbose 'VpxaHalResourcePool'] [OverrideTreeOnHostd] oldOperations != new operations, trying with new operation set...
[2011-01-28 08:34:27.214 16489B90 info 'VpxaHalResourcePool'] [OverrideTreeOnHostd] Trees are identical, nothing to do
[2011-01-28 08:34:27.214 16489B90 verbose 'VpxaHalResourcePool'] [OverrideTreeOnHostd] currOverrideState = 0, application succeeded, exiting...
[2011-01-28 08:34:27.214 16489B90 verbose 'VpxaHalResourcePool'] [VC-perf] Firing res pool change event, rootChanged = true
[2011-01-28 08:34:27.214 16489B90 verbose 'App'] [VpxaHalServices] ResPoolConfigChange Event fired
[2011-01-28 08:34:27.214 16489B90 verbose 'App'] [VpxaInvtHost] Increment master gen. no to (15857): ResourcePool:VpxaInvtHostResPoolListener::ConfigChanged
[2011-01-28 08:34:27.214 16489B90 verbose 'VpxaHalResourcePool'] [ProcessResourceNotification] No syncs pending, exiting loop...
[2011-01-28 08:34:30.243 163C6B90 error 'App' opID=F869A256-00000085-1b-98] [VpxaVMAP] sNodePolicyID -1 sNodeResourceID -1 tries 1
[2011-01-28 08:34:30.243 163C6B90 verbose 'App' opID=F869A256-00000085-1b-98] [VpxaVMAP::SetDasState] dasstate changing from configuring to running
[2011-01-28 08:34:30.243 163C6B90 verbose 'App' opID=F869A256-00000085-1b-98] [VpxaInvtHost] Increment master gen. no to (15858): Das:VpxaVMAP::SetDasState
[2011-01-28 08:34:30.243 163C6B90 verbose 'PropertyProvider' opID=F869A256-00000085-1b-98] RecordOp ASSIGN: info.state, task-134
[2011-01-28 08:34:30.243 163C6B90 verbose 'PropertyProvider' opID=F869A256-00000085-1b-98] RecordOp ASSIGN: info.cancelable, task-134
[2011-01-28 08:34:30.243 163C6B90 verbose 'PropertyProvider' opID=F869A256-00000085-1b-98] RecordOp ASSIGN: info.result, task-134
[2011-01-28 08:34:30.244 163C6B90 info 'App' opID=F869A256-00000085-1b-98] [VpxLRO] -- FINISH task-134 --  -- vpxapi.VpxaService.enableDAS -- 52575c0f-82fc-1ab7-b3a9-b1613d18063f
[2011-01-28 08:34:30.244 163C6B90 verbose 'PropertyProvider'] RecordOp ASSIGN: info.completeTime, task-134
[2011-01-28 08:34:30.244 16448B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Complete (processed 925 bytes)
[2011-01-28 08:34:30.247 16489B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
[2011-01-28 08:34:30.247 16281B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
[2011-01-28 08:34:30.247 16489B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2011-01-28 08:34:30.247 16281B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2011-01-28 08:34:30.247 16281B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
[2011-01-28 08:34:30.247 16281B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2011-01-28 08:34:30.247 16281B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Complete (processed 530 bytes)
[2011-01-28 08:34:30.247 16281B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Complete (processed 406 bytes)
[2011-01-28 08:34:31.283 FFED7B10 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
[2011-01-28 08:34:31.284 FFED7B10 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
[2011-01-28 08:34:31.285 16489B90 info 'App' opID=F869A256-00000085-1b-21] [VpxLRO] -- BEGIN task-internal-137 --  -- vpxapi.VpxaService.listDasPrimaries -- 52575c0f-82fc-1ab7-b3a9-b1613d18063f
[2011-01-28 08:34:31.285 16489B90 verbose 'App' opID=F869A256-00000085-1b-21] [VpxaVMAP::Invoke]Command to invoke is /opt/vmware/aam/bin/aamPerl /opt/vmware/aam/ha/aam_config_util.pl -z -shortname=ith-grsesx-01 -uname=VMkernel -cmd=listprimaries -domain=vmware
[2011-01-28 08:34:31.286 16489B90 info 'SysCommandPosix' opID=F869A256-00000085-1b-21] ForkExec(/opt/vmware/aam/bin/aamPerl)  10538150
[2011-01-28 08:34:31.733 16489B90 verbose 'App' opID=F869A256-00000085-1b-21] [VpxaVMAP::Invoke] task percent done is 100
[2011-01-28 08:34:32.244 16489B90 verbose 'App' opID=F869A256-00000085-1b-21] [VpxaVMAP::Invoke] Command output:
01/28/11 08:34:31 [print_args          ] Invoked command:
01/28/11 08:34:31 [print_args          ]     /opt/vmware/aam/bin/ftPerl /opt/vmware/aam/ha/aam_config_util.pl  -z -shortname=ith-grsesx-01 -uname=VMkernel -cmd=listprimaries -domain=vmware
01/28/11 08:34:31 [print_args          ] Environment:
01/28/11 08:34:31 [print_args          ]      FT_DIR=/opt/vmware/aam
01/28/11 08:34:31 [print_args          ]      FT_ISOLATION_TIME=1
01/28/11 08:34:31 [print_args          ]      GREP=/bin/grep
01/28/11 08:34:31 [print_args          ]      FT_CONFIG_DIR=/var/lib/vmware/aam
01/28/11 08:34:31 [print_args          ]      RPCINFO=/bin/rpcinfo
01/28/11 08:34:31 [print_args          ]      LD_LIBRARY_PATH=/lib:/usr/lib:/opt/vmware/aam/lib:/opt/vmware/vpxa/vpx:
01/28/11 08:34:31 [print_args          ]      PS=/bin/ps
01/28/11 08:34:31 [print_args          ]      FT_PERSISTED_CONFIG_DIR=/etc/opt/vmware/aam
01/28/11 08:34:31 [print_args          ]      PWD=/var/log/vmware/vpx
01/28/11 08:34:31 [print_args          ]      PS_OPTIONS=
01/28/11 08:34:31 [print_args          ]      FT_NO_CONSOLE_TRACE=1
01/28/11 08:34:31 [print_args          ]      PATH=/sbin:/usr/sbin:/bin:/usr/bin:/opt/vmware/aam/bin:/bin
01/28/11 08:34:31 [print_args          ]      FT_LOG_DIR=/var/log/vmware/aam
01/28/11 08:34:31 [print_args          ]      FT_DOMAIN=vmware
01/28/11 08:34:31 [print_args          ] Parsed arguments:
01/28/11 08:34:31 [print_args          ]      cmd=listprimaries
01/28/11 08:34:31 [print_args          ]      -z=1
01/28/11 08:34:31 [print_args          ]      uname=VMkernel
01/28/11 08:34:31 [print_args          ]      shortname=ith-grsesx-01
01/28/11 08:34:31 [print_args          ]      domain=vmware
01/28/11 08:34:31 [issue_cmd           ] CMD:    /opt/vmware/aam/bin/ft_gethostbyname ith-grsesx-01 |grep FAILED
01/28/11 08:34:31 [issue_cmd           ] STATUS: 1
01/28/11 08:34:31 [issue_cmd           ] RESULT:
01/28/11 08:34:31 [issue_cmd           ]
01/28/11 08:34:31 [list_primary_nodes  ]
01/28/11 08:34:31 [issue_cli_cmd       ] command is '/opt/vmware/aam/bin/ftcli -domain vmware -timeout 15 -cmd listnodes'
01/28/11 08:34:31 [issue_cmd           ] CMD:    /opt/vmware/aam/bin/ftcli -domain vmware -timeout 15 -cmd listnodes
01/28/11 08:34:31 [issue_cmd           ] STATUS: 0
01/28/11 08:34:31 [issue_cmd           ] RESULT:
01/28/11 08:34:31 [issue_cmd           ] *** Node ith-grsesx-02 is the master primary ***
01/28/11 08:34:31 [issue_cmd           ]         Node              Type              State
01/28/11 08:34:31 [issue_cmd           ] -----------------------  ------------    --------------
01/28/11 08:34:31 [issue_cmd           ]   ith-grsesx-01          Primary      Agent Running
01/28/11 08:34:31 [issue_cmd           ]   ith-grsesx-02          Primary      Agent Running
01/28/11 08:34:31 [issue_cmd           ]
VMwarelistprimaries=ith-grsesx-01,ith-grsesx-02
01/28/11 08:34:31 [myexit              ] VMwareresult=success
01/28/11 08:34:31 [elapsed_time        ] Total time for script to complete:  0 minute(s) and 0 second(s)
[2011-01-28 08:34:32.244 16489B90 verbose 'App' opID=F869A256-00000085-1b-21] [VpxaVMAP::Invoke] Command returned successfully
[2011-01-28 08:34:32.244 16489B90 verbose 'App' opID=F869A256-00000085-1b-21] [VpxaInvtHost] Increment master gen. no to (15859): Das:VpxaVMAP::ListPrimaries
[2011-01-28 08:34:32.244 16489B90 verbose 'SoapAdapter.HTTPService' opID=F869A256-00000085-1b-21] HTTP Response: Complete (processed 470 bytes)
[2011-01-28 08:34:32.244 16489B90 info 'App' opID=F869A256-00000085-1b-21] [VpxLRO] -- FINISH task-internal-137 --  -- vpxapi.VpxaService.listDasPrimaries -- 52575c0f-82fc-1ab7-b3a9-b1613d18063f
[2011-01-28 08:34:34.792 161BEB90 verbose 'App'] [VpxaAAM::PollForVMAPCommands] Running ftExecute
[2011-01-28 08:34:34.800 16448B90 warning 'App'] [VpxaHalStats] Unexpected return result. Expect 1 sample, receive 2
[2011-01-28 08:34:34.802 16448B90 verbose 'App'] Set internal stats for VM: 2 (vpxa VM id), 42 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:34.804 16448B90 verbose 'App'] Set internal stats for VM: 3 (vpxa VM id), 43 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:34.806 16448B90 verbose 'App'] Set internal stats for VM: 4 (vpxa VM id), 44 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:34.808 16448B90 verbose 'App'] Set internal stats for VM: 5 (vpxa VM id), 45 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:34.810 16448B90 verbose 'App'] Set internal stats for VM: 6 (vpxa VM id), 56 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:34.811 16448B90 verbose 'App'] Set internal stats for VM: 7 (vpxa VM id), 61 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:34.813 16448B90 verbose 'App'] Set internal stats for VM: 8 (vpxa VM id), 46 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:34.815 16448B90 verbose 'App'] Set internal stats for VM: 9 (vpxa VM id), 67 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:40.705 1617DB90 verbose 'App'] [VpxaVmFailover::ProcessQuickStats]  stats are for a gray vm ... ignoring
[2011-01-28 08:34:52.368 16240B90 verbose 'App'] [VpxaVMAP] Monitoring AAM health: vpxdDasStateOnLastInvocation(initialized) currentVpxdDasState(initialized) forceRunOfListNodes(0) isDasEnabled(1) skipOperation(1)
[2011-01-28 08:34:54.800 16385B90 warning 'App'] [VpxaHalStats] Unexpected return result. Expect 1 sample, receive 2
[2011-01-28 08:34:54.802 16385B90 verbose 'App'] Set internal stats for VM: 2 (vpxa VM id), 42 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:54.804 16385B90 verbose 'App'] Set internal stats for VM: 3 (vpxa VM id), 43 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:54.806 16385B90 verbose 'App'] Set internal stats for VM: 4 (vpxa VM id), 44 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:54.808 16385B90 verbose 'App'] Set internal stats for VM: 5 (vpxa VM id), 45 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:54.810 16385B90 verbose 'App'] Set internal stats for VM: 6 (vpxa VM id), 56 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:54.811 16385B90 verbose 'App'] Set internal stats for VM: 7 (vpxa VM id), 61 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:54.813 16385B90 verbose 'App'] Set internal stats for VM: 8 (vpxa VM id), 46 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:34:54.815 16385B90 verbose 'App'] Set internal stats for VM: 9 (vpxa VM id), 67 (vpxd VM id). Is FT primary? 0
[2011-01-28 08:35:00.709 1617DB90 verbose 'App'] [vpxaVmFailover::NotifyVmFailover] vm health monitoring is suspended

C:\ProgramData\VMware\VMware VirtualCenter\Logs

[2011-01-28 09:33:16.763 02384 info 'App'] [VpxLRO] -- BEGIN task-internal-5566 -- host-9 -- VpxdInvtHostSyncHostLRO.Synchronize --

[2011-01-28 09:33:16.865 02384 info 'App'] [VpxdMoHost::UpdateConfigInt] Marked 10.0.17.101 as dirty.

[2011-01-28 09:33:16.948 02384 info 'App'] [VpxLRO] -- FINISH task-internal-5566 -- host-9 -- VpxdInvtHostSyncHostLRO.Synchronize --

[2011-01-28 09:33:20.362 06884 info 'App' opID=F869A256-000000BF] [VpxLRO] -- BEGIN task-internal-5567 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.363 06884 info 'App' opID=F869A256-000000BF] [VpxLRO] -- FINISH task-internal-5567 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.363 11980 info 'App' opID=F869A256-000000C0] [VpxLRO] -- BEGIN task-internal-5568 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.364 11980 info 'App' opID=F869A256-000000C0] [VpxLRO] -- FINISH task-internal-5568 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.364 04364 info 'App' opID=F869A256-000000C1] [VpxLRO] -- BEGIN task-internal-5569 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.365 04364 info 'App' opID=F869A256-000000C1] [VpxLRO] -- FINISH task-internal-5569 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.365 10256 info 'App' opID=F869A256-000000C2] [VpxLRO] -- BEGIN task-internal-5570 --  -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.366 10256 info 'App' opID=F869A256-000000C2] [VpxLRO] -- FINISH task-internal-5570 --  -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.370 08492 info 'App' opID=F869A256-000000C3] [VpxLRO] -- BEGIN task-internal-5571 --  -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.371 08492 info 'App' opID=F869A256-000000C3] [VpxLRO] -- FINISH task-internal-5571 --  -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.379 11448 info 'App' opID=F869A256-000000C4] [VpxLRO] -- BEGIN task-internal-5572 --  -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.379 11448 info 'App' opID=F869A256-000000C4] [VpxLRO] -- FINISH task-internal-5572 --  -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.409 11808 info 'App' opID=F869A256-000000C6] [VpxLRO] -- BEGIN task-internal-5573 --  -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.410 11808 info 'App' opID=F869A256-000000C6] [VpxLRO] -- FINISH task-internal-5573 --  -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.421 06884 info 'App' opID=F869A256-000000C8] [VpxLRO] -- BEGIN task-internal-5574 --  -- vmodl.query.PropertyCollector.retrieveContents -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.422 06884 info 'App' opID=F869A256-000000C8] [VpxLRO] -- FINISH task-internal-5574 --  -- vmodl.query.PropertyCollector.retrieveContents -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.426 11980 info 'App' opID=F869A256-000000CA] [VpxLRO] -- BEGIN task-internal-5575 --  -- vim.LicenseAssignmentManager.queryAssignedLicenses -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.427 11980 info 'App' opID=F869A256-000000CA] [VpxLRO] -- FINISH task-internal-5575 --  -- vim.LicenseAssignmentManager.queryAssignedLicenses -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.795 08592 info 'App' opID=F869A256-000000BE] [VpxLRO] -- BEGIN task-internal-5576 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:20.795 08592 info 'App' opID=F869A256-000000BE] [VpxLRO] -- FINISH task-internal-5576 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:33:27.929 02384 info 'StackTracer'] [2384] Enter DAS_PROFILE UpdateDasStatus

[2011-01-28 09:33:27.930 02384 info 'StackTracer'] [2384] Exit DAS_PROFILE UpdateDasStatus (1 ms)

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] SQL execution took too long: INSERT INTO VPX_BINARY_DATA WITH (ROWLOCK) (ID, DATA_TYPE, BIN_DATA, CREATED_TIME, CHANGE_ID) VALUES (?, ?, ?, ?, ?)

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] Execution elapsed time: 18571 ms

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] Bind parameters:

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 1, size: 4, arraySize: 0

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] value = 605

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 11, size: 30, arraySize: 0

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] value = "Ldap.BackupData"

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 12, size: -39887066, arraySize: 0

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 10, size: 23, arraySize: 0

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 11, size: 10, arraySize: 0

[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] value = "36999"

[2011-01-28 09:33:31.472 06280 warning 'App'] [DebugTime] VpxdLdap:WriteBackup took 19 sec.

[2011-01-28 09:33:31.489 06280 info 'App'] [VpxdLdap] Backed up LDAP data to database.

[2011-01-28 09:33:31.558 06280 info 'App'] [VpxdLdap] Removed old LDAP backup 602.

[2011-01-28 09:33:42.052 03088 info 'StackTracer' opID=F869A256-00000085-1b] [3088] Exit DAS_PROFILE Config (69112 ms)

[2011-01-28 09:33:43.063 03088 info 'App' opID=F869A256-00000085-1b] [VpxdMoHost] vpxd DAS state on host 10.0.17.101 changed to running

[2011-01-28 09:33:43.063 03088 info 'App' opID=F869A256-00000085-1b] [VpxdMoHost::UpdateDasStateInt] Marked 10.0.17.101 as dirty.

[2011-01-28 09:33:43.066 03088 info 'App' opID=F869A256-00000085-1b] [VpxdMoHost::UpdateVpxaDasStateInt] vpxa DAS state on host 10.0.17.101 changed to running

[2011-01-28 09:33:43.066 03088 info 'App' opID=F869A256-00000085-1b] [VpxdMoHost::UpdateVpxaDasStateInt] Marked 10.0.17.101 as dirty.

[2011-01-28 09:33:43.320 03088 info 'StackTracer' opID=F869A256-00000085-1b] [3088] Enter DAS_PROFILE UpdateDasStatus

[2011-01-28 09:33:43.320 03088 info 'App' opID=F869A256-00000085-1b] [VpxdDas] Bad primary ith-grsesx-01: connected=0, dasState=running, vpxaDasState=running

[2011-01-28 09:33:43.320 03088 info 'StackTracer' opID=F869A256-00000085-1b] [3088] Exit DAS_PROFILE UpdateDasStatus (0 ms)

[2011-01-28 09:33:43.320 03088 info 'App' opID=F869A256-00000085-1b] [VpxLRO] -- FINISH task-488 -- host-9 -- DasConfig.ConfigureHost --

[2011-01-28 09:34:20.668 08492 info 'App' opID=f7b5783a] [VpxLRO] -- BEGIN task-internal-5577 -- host-83 -- vim.HostSystem.acquireCimServicesTicket -- 505F7A9F-BF04-4736-88A6-DFBDABEDF9B1(EA8DFFCD-1332-46CB-807F-A82CC0151FE8)

[2011-01-28 09:34:20.684 08492 info 'App' opID=f7b5783a] [VpxLRO] -- FINISH task-internal-5577 -- host-83 -- vim.HostSystem.acquireCimServicesTicket -- 505F7A9F-BF04-4736-88A6-DFBDABEDF9B1(EA8DFFCD-1332-46CB-807F-A82CC0151FE8)

[2011-01-28 09:34:45.604 03632 warning 'ProxySvc Req31269'] The client closed the stream with a request not fully forwarded.

[2011-01-28 09:35:00.001 08592 info 'App'] [VpxdStatsCollector::CollectRemote] 2 threads doing remote collection

[2011-01-28 09:35:45.592 01900 warning 'ProxySvc Req31272'] The client closed the stream with a request not fully forwarded.

[2011-01-28 09:36:06.283 06280 info 'App'] [LicMgr] No vm-based licenses. So skipping collection.

[2011-01-28 09:36:15.233 10256 info 'App'] [ClusterCompatibilitySet::ComputeCompatibilitySet] Before lock Dirty hosts: 1 ; Dirty VMs: 0 ; Clean hosts: 1 ; Clean VMs: 9

[2011-01-28 09:36:15.447 10256 info 'App'] [VpxdDas] Bad primary ith-grsesx-01: connected=0, dasState=running, vpxaDasState=running

[2011-01-28 09:36:45.596 01900 warning 'ProxySvc Req31275'] The client closed the stream with a request not fully forwarded.

[2011-01-28 09:37:10.074 01900 info 'App' opID=F869A256-000000D1] [VpxLRO] -- BEGIN task-internal-5580 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:37:10.074 01900 info 'App' opID=F869A256-000000D1] [VpxLRO] -- FINISH task-internal-5580 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:37:10.076 03088 info 'App' opID=F869A256-000000D2] [VpxLRO] -- BEGIN task-internal-5581 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:37:10.076 03088 info 'App' opID=F869A256-000000D2] [VpxLRO] -- FINISH task-internal-5581 --  -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:37:37.893 08592 warning 'App'] [SwitchTempTable] Next switch time = 299812500ms later,

[2011-01-28 09:38:10.086 01900 warning 'ProxySvc Req31278'] The client closed the stream with a request not fully forwarded.

[2011-01-28 09:38:45.647 04364 info 'App' opID=F869A256-000000D3] [VpxLRO] -- BEGIN task-internal-5582 --  -- vmodl.query.PropertyCollector.cancelWaitForUpdates -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:38:45.648 04364 error 'App' opID=F869A256-000000D3] Connection lost while waiting for the next request on stream TCPStreamWin32(socket=TCP(fd=1904) local=[::1]:8085,  peer=[::1]:57605): class Vmacore::SystemException(An established connection was aborted by the software in your host machine. )

[2011-01-28 09:38:45.648 04364 info 'App' opID=F869A256-000000D3] [VpxLRO] -- FINISH task-internal-5582 --  -- vmodl.query.PropertyCollector.cancelWaitForUpdates -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)

[2011-01-28 09:39:45.654 01900 warning 'ProxySvc Req31282'] The client closed the stream with a request not fully forwarded.

[2011-01-28 09:40:00.002 08592 info 'App'] [VpxdStatsCollector::CollectRemote] 2 threads doing remote collection

[2011-01-28 09:41:07.094 08592 error 'App'] [Vdb::Connection::Commit] Connection commit failed: -1. Start retrying 20 more times.

[2011-01-28 09:41:15.233 06280 info 'App'] [VpxdDas] Bad primary ith-grsesx-01: connected=0, dasState=running, vpxaDasState=running

[2011-01-28 09:41:15.663 01900 warning 'ProxySvc Req31286'] The client closed the stream with a request not fully forwarded.

Reply
0 Kudos
ignos
Contributor
Contributor

Hi there,

Today I found out this thread regarding the same problem, after searching for this vpxa.log message "[VpxaVmFailover::ProcessQuickStats]  stats are for a gray vm" that also shows up in the log you posted.

In my opinion, an HA failover procedure has been triggered because the HA agent of the disconnected ESXi hosts couldn't gather any statistical data of several virtual machines, having at the same time some contention  (the 'Toomanywrites' message looks like a bad symptom too).

The network is OK (if you check your log, there is a successful ping test there), but something is preventing the HA agent to reach any vm metrics on the way ("stats are for a gray vm"). On the other hand, the hosts are OK but they fail to report back to VCenter and then the disconnection comes...

This HA failover behavior should be mostly normal, if you have this HA setting enabled (VM Monitoring only, or VM and Application Monitoring):

captura184.png

On my musings about this problem, I found several things that at the end worked for me:

- If you use VM Monitoring or VM and Application Monitoring, check out that all your vm's have the vmware tools installed. Without them it looks like that you could have those HA failover events in the future (not mandatory, but possible when there is contention, at least at the storage side). Otherwise if you cannot modify your VM's, it would be better to disable it.

- If your VCenter Server has several IP addresses, you should set up which is the Managed IP Address. You can follow KB1008030.

- Check out any performance problems, specially at the storage layer. In our case we had contention at the storage layer, because our filer was handling too many volume snapshots at the same time. At that time, vm's were having a sluggish performance: fixing our storage problems helped a lot to avoid this issue too.

Let's hope it works for you too, please let me know.

Best regards,

Nacho