Hello!
I've a small vSphere infrastructure with only 2 ESXi (4.1.0 260247) and a Virtual Center on a Win2008 VM.
One of the two ESXi shows the following problem:
- its status is not responding and the VM it hosts are disconnected
- if I right-click and click Connect, the reconnection process starts: the host is connected, HA is configured and everything seems working
- after 1-2 minutes the host disconnects again
I followed the guidelines in the KB:
Network connection seems ok (I can ping, telnet to port 902 from vCenter)
hostd and vpxa are running (tried to restart them with /sbin/services.sh restart)
the hosted VM work fine
If I directly connect my VIClient to the ESXi, I can read the following error in the task pane:
Task: Flush firmware configuration
Status: fault.TooManyWrites.summary
How can I solve?
Thanks!
is there any firewall in between the host and vcenter?
No, they are on the same network...
ESXi 1 --> 10.0.17.101
ESXi 2 --> 10.0.17.102
vCenter Server --> 10.0.17.110
ESXi 2 works fine, while ESXi 1 is disconnecting after a couple of minutes
I haven't seen this error before, so I can only guess.
Is the Hypervisor installed on an USB device? Maybe there are issues with the device!? Did you configure the scratch location?
If it is installed to disk, login to the tech support console and run df -h.
André
Hello!
ESXi is installed on local HD (2 in mirror), server is a Dell PowerEdge R710.
An iSCSI storage (Iomega StorCenter) is connected to both ESXi.
~ # df -h
Filesystem Size Used Available Use% Mounted on
visorfs 1.5G 322.2M 1.2G 21% /
vmfs3 999.8G 341.8G 657.9G 34% /vmfs/volumes/4cf9fd54-b34a2fa0-5af4-001b217dba68
vfat 4.0G 2.9M 4.0G 0% /vmfs/volumes/4cf8cfa4-a5a837b5-6890-b8ac6f8eda8a
vmfs3 999.8G 332.7G 667.1G 33% /vmfs/volumes/4cf9fd66-d6bc4910-ff95-001b217dba68
vmfs3 227.3G 30.5G 196.7G 13% /vmfs/volumes/4cf8cfed-7e720e5a-3178-b8ac6f8eda8a
vfat 249.7M 102.1M 147.6M 41% /vmfs/volumes/2fbaf874-39d71d01-0154-499e8be39a2a
vfat 249.7M 4.0k 249.7M 0% /vmfs/volumes/734c1e74-7723f518-51b1-09c58a055e03
vfat 285.9M 135.5M 150.4M 47% /vmfs/volumes/3c3693e8-f77a642a-1910-5c6bdcb26d3a
hi,
Could you please check the vpxa.log and hostd.log in ESXi and vpxd.log or VC logs in windows 2008. we must be able to get some suspecious area that need to troubleshoot.
What is the last operation that system is triggering before the disconnect.
Thanks
Nithin
Hello!
(sorry for the delay but I was out of office...)
C:\ProgramData\VMware\VMware VirtualCenter\Logs
[2011-01-28 09:33:16.763 02384 info 'App'] [VpxLRO] -- BEGIN task-internal-5566 -- host-9 -- VpxdInvtHostSyncHostLRO.Synchronize --
[2011-01-28 09:33:16.865 02384 info 'App'] [VpxdMoHost::UpdateConfigInt] Marked 10.0.17.101 as dirty.
[2011-01-28 09:33:16.948 02384 info 'App'] [VpxLRO] -- FINISH task-internal-5566 -- host-9 -- VpxdInvtHostSyncHostLRO.Synchronize --
[2011-01-28 09:33:20.362 06884 info 'App' opID=F869A256-000000BF] [VpxLRO] -- BEGIN task-internal-5567 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.363 06884 info 'App' opID=F869A256-000000BF] [VpxLRO] -- FINISH task-internal-5567 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.363 11980 info 'App' opID=F869A256-000000C0] [VpxLRO] -- BEGIN task-internal-5568 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.364 11980 info 'App' opID=F869A256-000000C0] [VpxLRO] -- FINISH task-internal-5568 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.364 04364 info 'App' opID=F869A256-000000C1] [VpxLRO] -- BEGIN task-internal-5569 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.365 04364 info 'App' opID=F869A256-000000C1] [VpxLRO] -- FINISH task-internal-5569 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.365 10256 info 'App' opID=F869A256-000000C2] [VpxLRO] -- BEGIN task-internal-5570 -- -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.366 10256 info 'App' opID=F869A256-000000C2] [VpxLRO] -- FINISH task-internal-5570 -- -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.370 08492 info 'App' opID=F869A256-000000C3] [VpxLRO] -- BEGIN task-internal-5571 -- -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.371 08492 info 'App' opID=F869A256-000000C3] [VpxLRO] -- FINISH task-internal-5571 -- -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.379 11448 info 'App' opID=F869A256-000000C4] [VpxLRO] -- BEGIN task-internal-5572 -- -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.379 11448 info 'App' opID=F869A256-000000C4] [VpxLRO] -- FINISH task-internal-5572 -- -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.409 11808 info 'App' opID=F869A256-000000C6] [VpxLRO] -- BEGIN task-internal-5573 -- -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.410 11808 info 'App' opID=F869A256-000000C6] [VpxLRO] -- FINISH task-internal-5573 -- -- vmodl.query.PropertyCollector.createFilter -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.421 06884 info 'App' opID=F869A256-000000C8] [VpxLRO] -- BEGIN task-internal-5574 -- -- vmodl.query.PropertyCollector.retrieveContents -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.422 06884 info 'App' opID=F869A256-000000C8] [VpxLRO] -- FINISH task-internal-5574 -- -- vmodl.query.PropertyCollector.retrieveContents -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.426 11980 info 'App' opID=F869A256-000000CA] [VpxLRO] -- BEGIN task-internal-5575 -- -- vim.LicenseAssignmentManager.queryAssignedLicenses -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.427 11980 info 'App' opID=F869A256-000000CA] [VpxLRO] -- FINISH task-internal-5575 -- -- vim.LicenseAssignmentManager.queryAssignedLicenses -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.795 08592 info 'App' opID=F869A256-000000BE] [VpxLRO] -- BEGIN task-internal-5576 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:20.795 08592 info 'App' opID=F869A256-000000BE] [VpxLRO] -- FINISH task-internal-5576 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:33:27.929 02384 info 'StackTracer'] [2384] Enter DAS_PROFILE UpdateDasStatus
[2011-01-28 09:33:27.930 02384 info 'StackTracer'] [2384] Exit DAS_PROFILE UpdateDasStatus (1 ms)
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] SQL execution took too long: INSERT INTO VPX_BINARY_DATA WITH (ROWLOCK) (ID, DATA_TYPE, BIN_DATA, CREATED_TIME, CHANGE_ID) VALUES (?, ?, ?, ?, ?)
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] Execution elapsed time: 18571 ms
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] Bind parameters:
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 1, size: 4, arraySize: 0
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] value = 605
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 11, size: 30, arraySize: 0
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] value = "Ldap.BackupData"
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 12, size: -39887066, arraySize: 0
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 10, size: 23, arraySize: 0
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] datatype: 11, size: 10, arraySize: 0
[2011-01-28 09:33:30.900 06280 warning 'App'] [VdbStatement] value = "36999"
[2011-01-28 09:33:31.472 06280 warning 'App'] [DebugTime] VpxdLdap:WriteBackup took 19 sec.
[2011-01-28 09:33:31.489 06280 info 'App'] [VpxdLdap] Backed up LDAP data to database.
[2011-01-28 09:33:31.558 06280 info 'App'] [VpxdLdap] Removed old LDAP backup 602.
[2011-01-28 09:33:42.052 03088 info 'StackTracer' opID=F869A256-00000085-1b] [3088] Exit DAS_PROFILE Config (69112 ms)
[2011-01-28 09:33:43.063 03088 info 'App' opID=F869A256-00000085-1b] [VpxdMoHost] vpxd DAS state on host 10.0.17.101 changed to running
[2011-01-28 09:33:43.063 03088 info 'App' opID=F869A256-00000085-1b] [VpxdMoHost::UpdateDasStateInt] Marked 10.0.17.101 as dirty.
[2011-01-28 09:33:43.066 03088 info 'App' opID=F869A256-00000085-1b] [VpxdMoHost::UpdateVpxaDasStateInt] vpxa DAS state on host 10.0.17.101 changed to running
[2011-01-28 09:33:43.066 03088 info 'App' opID=F869A256-00000085-1b] [VpxdMoHost::UpdateVpxaDasStateInt] Marked 10.0.17.101 as dirty.
[2011-01-28 09:33:43.320 03088 info 'StackTracer' opID=F869A256-00000085-1b] [3088] Enter DAS_PROFILE UpdateDasStatus
[2011-01-28 09:33:43.320 03088 info 'App' opID=F869A256-00000085-1b] [VpxdDas] Bad primary ith-grsesx-01: connected=0, dasState=running, vpxaDasState=running
[2011-01-28 09:33:43.320 03088 info 'StackTracer' opID=F869A256-00000085-1b] [3088] Exit DAS_PROFILE UpdateDasStatus (0 ms)
[2011-01-28 09:33:43.320 03088 info 'App' opID=F869A256-00000085-1b] [VpxLRO] -- FINISH task-488 -- host-9 -- DasConfig.ConfigureHost --
[2011-01-28 09:34:20.668 08492 info 'App' opID=f7b5783a] [VpxLRO] -- BEGIN task-internal-5577 -- host-83 -- vim.HostSystem.acquireCimServicesTicket -- 505F7A9F-BF04-4736-88A6-DFBDABEDF9B1(EA8DFFCD-1332-46CB-807F-A82CC0151FE8)
[2011-01-28 09:34:20.684 08492 info 'App' opID=f7b5783a] [VpxLRO] -- FINISH task-internal-5577 -- host-83 -- vim.HostSystem.acquireCimServicesTicket -- 505F7A9F-BF04-4736-88A6-DFBDABEDF9B1(EA8DFFCD-1332-46CB-807F-A82CC0151FE8)
[2011-01-28 09:34:45.604 03632 warning 'ProxySvc Req31269'] The client closed the stream with a request not fully forwarded.
[2011-01-28 09:35:00.001 08592 info 'App'] [VpxdStatsCollector::CollectRemote] 2 threads doing remote collection
[2011-01-28 09:35:45.592 01900 warning 'ProxySvc Req31272'] The client closed the stream with a request not fully forwarded.
[2011-01-28 09:36:06.283 06280 info 'App'] [LicMgr] No vm-based licenses. So skipping collection.
[2011-01-28 09:36:15.233 10256 info 'App'] [ClusterCompatibilitySet::ComputeCompatibilitySet] Before lock Dirty hosts: 1 ; Dirty VMs: 0 ; Clean hosts: 1 ; Clean VMs: 9
[2011-01-28 09:36:15.447 10256 info 'App'] [VpxdDas] Bad primary ith-grsesx-01: connected=0, dasState=running, vpxaDasState=running
[2011-01-28 09:36:45.596 01900 warning 'ProxySvc Req31275'] The client closed the stream with a request not fully forwarded.
[2011-01-28 09:37:10.074 01900 info 'App' opID=F869A256-000000D1] [VpxLRO] -- BEGIN task-internal-5580 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:37:10.074 01900 info 'App' opID=F869A256-000000D1] [VpxLRO] -- FINISH task-internal-5580 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:37:10.076 03088 info 'App' opID=F869A256-000000D2] [VpxLRO] -- BEGIN task-internal-5581 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:37:10.076 03088 info 'App' opID=F869A256-000000D2] [VpxLRO] -- FINISH task-internal-5581 -- -- vmodl.query.PropertyCollector.Filter.destroy -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:37:37.893 08592 warning 'App'] [SwitchTempTable] Next switch time = 299812500ms later,
[2011-01-28 09:38:10.086 01900 warning 'ProxySvc Req31278'] The client closed the stream with a request not fully forwarded.
[2011-01-28 09:38:45.647 04364 info 'App' opID=F869A256-000000D3] [VpxLRO] -- BEGIN task-internal-5582 -- -- vmodl.query.PropertyCollector.cancelWaitForUpdates -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:38:45.648 04364 error 'App' opID=F869A256-000000D3] Connection lost while waiting for the next request on stream TCPStreamWin32(socket=TCP(fd=1904) local=[::1]:8085, peer=[::1]:57605): class Vmacore::SystemException(An established connection was aborted by the software in your host machine. )
[2011-01-28 09:38:45.648 04364 info 'App' opID=F869A256-000000D3] [VpxLRO] -- FINISH task-internal-5582 -- -- vmodl.query.PropertyCollector.cancelWaitForUpdates -- C89EE530-39FF-4E93-8B31-EB12D9DFE1A7(2D8E6E74-C205-4CE3-8C40-4B3C41F65DDB)
[2011-01-28 09:39:45.654 01900 warning 'ProxySvc Req31282'] The client closed the stream with a request not fully forwarded.
[2011-01-28 09:40:00.002 08592 info 'App'] [VpxdStatsCollector::CollectRemote] 2 threads doing remote collection
[2011-01-28 09:41:07.094 08592 error 'App'] [Vdb::Connection::Commit] Connection commit failed: -1. Start retrying 20 more times.
[2011-01-28 09:41:15.233 06280 info 'App'] [VpxdDas] Bad primary ith-grsesx-01: connected=0, dasState=running, vpxaDasState=running
[2011-01-28 09:41:15.663 01900 warning 'ProxySvc Req31286'] The client closed the stream with a request not fully forwarded.
Hi there,
Today I found out this thread regarding the same problem, after searching for this vpxa.log message "[VpxaVmFailover::ProcessQuickStats] stats are for a gray vm" that also shows up in the log you posted.
In my opinion, an HA failover procedure has been triggered because the HA agent of the disconnected ESXi hosts couldn't gather any statistical data of several virtual machines, having at the same time some contention (the 'Toomanywrites' message looks like a bad symptom too).
The network is OK (if you check your log, there is a successful ping test there), but something is preventing the HA agent to reach any vm metrics on the way ("stats are for a gray vm"). On the other hand, the hosts are OK but they fail to report back to VCenter and then the disconnection comes...
This HA failover behavior should be mostly normal, if you have this HA setting enabled (VM Monitoring only, or VM and Application Monitoring):
On my musings about this problem, I found several things that at the end worked for me:
- If you use VM Monitoring or VM and Application Monitoring, check out that all your vm's have the vmware tools installed. Without them it looks like that you could have those HA failover events in the future (not mandatory, but possible when there is contention, at least at the storage side). Otherwise if you cannot modify your VM's, it would be better to disable it.
- If your VCenter Server has several IP addresses, you should set up which is the Managed IP Address. You can follow KB1008030.
- Check out any performance problems, specially at the storage layer. In our case we had contention at the storage layer, because our filer was handling too many volume snapshots at the same time. At that time, vm's were having a sluggish performance: fixing our storage problems helped a lot to avoid this issue too.
Let's hope it works for you too, please let me know.
Best regards,
Nacho