1 2 Previous Next 16 Replies Latest reply on Jan 18, 2013 4:01 AM by Sreec

    hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)

    einstein-a-go-go Hot Shot
    vExpert

      I've got a very odd installation, it's been stable for 6 months, I'm not sure if this is conciendence after implementing VDP, but now after 45 seconds, the ESXi 5.1.0 Build 799733 hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding.

       

      I can right click each host, and issue a Connect and the Servers are re-connected, and then after 45 seconds, appear as not responding again.

       

      The vCenter is a virtual machine, on the cluster of two Hosts, using VMXNET3 interface, fully patched.

       

      from the vCenter server at the time of the disconnects or not responding, I'm not seeing any ping timeouts to the hosts, from vCenter VM to Hosts, I can also connect directly to the ESXi servers, with vSphere Client, and no issues there, I remain connected.

       

      DNS is okay, Default Gateway is okay, database being used is default install with vCenter (simple install), SQL Express, database is not full or reached it's limit.

       

      nothing in Windows Event Log...

       

      things tried

       

      1. Restarted ESXi hosts

      2. Restart Network Management agents

      3. Upgraded vCenter Server to Build Build 947673

      4. restarted vCenter Server

      5. Shutdown all VMs, except vCenter Server including VDP.

       

      not done hosts yet!

       

      in the vpxd logs I can see the following:- so I can see I think the disconnection, but no idea as to why!

       

      2013-01-17T13:08:37.741Z [06164 info 'vpxdvpxdVmomi' opID=SWI-6e140ec8] [ClientAdapterBase::InvokeOnSoap] Invoke done (esx003.domain.ac.uk, vpxapi.VpxaService.fetchQuickStats)
      2013-01-17T13:08:37.741Z [06164 info 'vpxdvpxdVmomi' opID=SWI-6e140ec8] [ClientAdapterBase::InvokeOnSoap] Invoke done (esx004.domain.ac.uk, vpxapi.VpxaService.fetchQuickStats)
      2013-01-17T13:08:39.659Z [06292 info 'vpxdvpxdHostCnx' opID=SWI-7a1cb616] [VpxdHostCnx] No heartbeats received from host 5203136b-c37d-c238-69c3-7101151dae9b within 4398798518000 ms
      2013-01-17T13:08:39.659Z [06292 info 'vpxdvpxdHostCnx' opID=SWI-7a1cb616] [VpxdHostCnx] No heartbeats received from host 5274675a-2fb6-a9ae-7787-9f24c69ff6e5 within 4398798518000 ms
      2013-01-17T13:08:39.659Z [06620 info 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost] Got lost connection callback for host-28
      2013-01-17T13:08:39.659Z [04264 info 'commonvpxLro'] [VpxLRO] -- BEGIN task-internal-212 -- host-28 -- VpxdInvtHostSyncHostLRO.Synchronize --
      2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-28
      2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost::FixNotRespondingHost] Returning false since host is already fixed!
      2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Failed to fix not responding host host-28
      2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-28
      2013-01-17T13:08:39.659Z [04264 error 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] FixNotRespondingHost failed for host host-28, marking host as notResponding
      2013-01-17T13:08:39.659Z [06620 info 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost] Got lost connection callback for host-24
      2013-01-17T13:08:39.659Z [06620 info 'commonvpxLro'] [VpxLRO] -- BEGIN task-internal-213 -- host-24 -- VpxdInvtHostSyncHostLRO.Synchronize --
      2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-24
      2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost::FixNotRespondingHost] Returning false since host is already fixed!
      2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Failed to fix not responding host host-24
      2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-24
      2013-01-17T13:08:39.659Z [06620 error 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] FixNotRespondingHost failed for host host-24, marking host as notResponding
      2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdMoHost'] [HostMo] host connection state changed to [NO_RESPONSE] for host-28
      2013-01-17T13:08:39.784Z [04264 info 'vpxdvpxdMoHost'] [HostMo::SetComputeCompatibilityDirty] Marked host-28 as dirty.
      2013-01-17T13:08:39.784Z [04264 info 'clustervpxdMoCluster'] [ClusterMo::SetDasCompatDirty] Marked domain-c37 as dirty.
      2013-01-17T13:08:39.800Z [04264 info 'commonvpxLro'] [VpxLRO] -- FINISH task-internal-212 -- host-28 -- VpxdInvtHostSyncHostLRO.Synchronize --
      2013-01-17T13:08:39.831Z [06620 warning 'vpxdvpxdMoHost'] [HostMo] host connection state changed to [NO_RESPONSE] for host-24
      2013-01-17T13:08:39.987Z [06620 info 'vpxdvpxdMoHost'] [HostMo::SetComputeCompatibilityDirty] Marked host-24 as dirty.
      2013-01-17T13:08:39.987Z [06620 info 'clustervpxdMoCluster'] [ClusterMo::SetDasCompatDirty] Marked domain-c37 as dirty.
      2013-01-17T13:08:40.003Z [06620 info 'commonvpxLro'] [VpxLRO] -- FINISH task-internal-213 -- host-24 -- VpxdInvtHostSyncHostLRO.Synchronize --

       

      any ideas?

        • 1. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
          zXi_Gamer Master

          Well one clue here is that if it is happening exactly after 45 secs, then it must be some agent which is threading at regular intervals. Also, since there are no timeouts, one possible reason might be the vpxa[VC] recognizes that the hosts are not meeting some requirement and then disconnects them.

           

          do check on the hosts vmkernel log file before the disconnect to confirm any information. In your cluster settings, can you do a quick "apply recommendation" and find if any requirement is not met, like FDM or HA.

          • 3. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
            einstein-a-go-go Hot Shot
            vExpert

            Standard LAN, no NAT or firewalls.

             

            also tried remvoing hosts from vCenter, and re-adding.

            checking managed IP Address

            checking managed vcenter fqdn

            hosts can ping vCenter by fqdn

            hosts can ping vCenter by IP Address

            • 4. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
              Sreec Master
              Community WarriorsvExpert

              Hi ,

                   From the logs i can see 2013-01-17T13:08:39.784Z [04264 info 'vpxdvpxdMoHost'] [HostMo::SetComputeCompatibilityDirty] Marked host-28 as dirty.

               

              Hope that is the correct host?

               

              1.If network connectivity is fine btwn ESX&VC,i would suggest you to go ahead and reinstall the vpxa agent (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1031919)

               

              Note:When you connect the host back,the agent will be automatically pushed

               

              2.If the issue still persist,please do post the vpxd logs and hostd logs

               

              Note:If possible increase the VC logging level to verbose/trivia prior to log capture

               

              3.Do you have NAT configured btwn Host/VC? If so,its unsupported configuration.However there is a small

              tweaking that we can do make it work!!!

              • 5. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
                zXi_Gamer Master

                Hmmm.. one particular message

                2013-01-17T13:08:39.659Z [06292 info 'vpxdvpxdHostCnx'  opID=SWI-7a1cb616] [VpxdHostCnx] No heartbeats received from host  5203136b-c37d-c238-69c3-7101151dae9b within 4398798518000 ms

                If there are no heartbeats for 4398798518000ms, i.e 7 mins if I am correct, then can you confirm if the hostd is working in the servers. Do check on var/log/hostd.log and check if the hostd crashes with any information..

                 

                I can also suggest to raise a support request with VMware if this is affecting production environment

                1 person found this helpful
                • 6. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
                  einstein-a-go-go Hot Shot
                  vExpert

                  very little happending in vmkernel.log at time of disconnect

                   

                  2013-01-17T13:41:03.815Z cpu5:5120)Config: 347: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
                  2013-01-17T13:41:04.613Z cpu6:7914)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
                  2013-01-17T13:42:23.730Z cpu0:4104)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007d33c0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                  2013-01-17T13:42:23.730Z cpu0:4104)ScsiDeviceIO: 2316: Cmd(0x4124007d33c0) 0x1a, CmdSN 0x1e36 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                  2013-01-17T13:42:23.754Z cpu5:4101)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007d33c0, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                  2013-01-17T13:42:23.754Z cpu5:4101)ScsiDeviceIO: 2316: Cmd(0x4124007d33c0) 0x1a, CmdSN 0x1e38 from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                  2013-01-17T13:42:24.389Z cpu1:7841)Config: 347: "VMOverheadGrowthLimit" = -1, Old Value: 0, (Status: 0x0)
                  2013-01-17T13:43:03.680Z cpu7:5167)Config: 347: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
                  2013-01-17T13:43:04.475Z cpu1:7841)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
                  2013-01-17T13:44:21.227Z cpu4:5123)Config: 347: "VMOverheadGrowthLimit" = -1, Old Value: 0, (Status: 0x0)
                  2013-01-17T13:45:04.135Z cpu2:7914)Config: 347: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
                  2013-01-17T13:45:04.927Z cpu7:7925)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
                  2013-01-17T13:47:23.746Z cpu2:4098)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007c3d00, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                  2013-01-17T13:47:23.746Z cpu2:4098)ScsiDeviceIO: 2316: Cmd(0x4124007c3d00) 0x1a, CmdSN 0x1e5b from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                  2013-01-17T13:47:23.771Z cpu4:4100)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007c3d00, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                  2013-01-17T13:47:23.771Z cpu4:4100)ScsiDeviceIO: 2316: Cmd(0x4124007c3d00) 0x1a, CmdSN 0x1e5d from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                  2013-01-17T13:52:08.507Z cpu4:4100)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x85 (0x4124007a3240, 5214) to dev "naa.6001e4f01d7de30017a27f5a7af7e639" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                  2013-01-17T13:52:08.507Z cpu4:4100)ScsiDeviceIO: 2316: Cmd(0x4124007a3240) 0x85, CmdSN 0x93 from world 5214 to dev "naa.6001e4f01d7de30017a27f5a7af7e639" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                  2013-01-17T13:52:08.507Z cpu4:4475)ScsiDeviceIO: 2316: Cmd(0x4124007a3240) 0x4d, CmdSN 0x94 from world 5214 to dev "naa.6001e4f01d7de30017a27f5a7af7e639" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                  2013-01-17T13:52:08.507Z cpu4:4475)ScsiDeviceIO: 2316: Cmd(0x4124007a3240) 0x1a, CmdSN 0x95 from world 5214 to dev "naa.6001e4f01d7de30017a27f5a7af7e639" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
                  2013-01-17T13:52:23.761Z cpu5:4101)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007c4000, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                  2013-01-17T13:52:23.761Z cpu5:4101)ScsiDeviceIO: 2316: Cmd(0x4124007c4000) 0x1a, CmdSN 0x1e69 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                  2013-01-17T13:52:23.786Z cpu5:4101)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007c4000, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                  2013-01-17T13:52:23.786Z cpu5:4101)ScsiDeviceIO: 2316: Cmd(0x4124007c4000) 0x1a, CmdSN 0x1e6b from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

                   

                  those errors are local disk, two SAS disks as RAID1 which are not used, and those errors could have always been there.

                   

                  at time of disconnect nothing

                  • 7. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
                    einstein-a-go-go Hot Shot
                    vExpert

                    no NAT.

                     

                    checking hostd.logs

                     

                    this is the log tail at time of disconnect,

                     

                    2013-01-17T14:05:02.763Z [6E4D5B90 verbose 'Locale' opID=QS-host-501-d09c6b96-a8] Default resource used for 'counter.sys.resourceMemConsumed.summary' expected in module 'perf'.
                    2013-01-17T14:05:02.802Z [6E4D5B90 verbose 'ha-license-manager' opID=QS-host-501-d09c6b96-a8] Load: Loading existing file: /etc/vmware/license.cfg
                    2013-01-17T14:05:02.819Z [6E4D5B90 verbose 'Default' opID=QS-host-501-d09c6b96-a8] ha-license-manager:Validate -> Valid license detected for "VMware ESX Server 5.0" (lastError=0, desc.IsValid:Yes)
                    2013-01-17T14:05:02.830Z [6E4D5B90 error 'Default' opID=QS-host-501-d09c6b96-a8] Unable to parse MaxRam value:
                    2013-01-17T14:05:02.830Z [6E4D5B90 error 'Default' opID=QS-host-501-d09c6b96-a8] Unable to parse MaxRamPerCpu value:
                    2013-01-17T14:05:02.830Z [6E4D5B90 error 'Default' opID=QS-host-501-d09c6b96-a8] Unable to parse MinRamPerCpu value:
                    2013-01-17T14:05:02.830Z [6E4D5B90 error 'Default' opID=QS-host-501-d09c6b96-a8] Unable to parse vram value:
                    2013-01-17T14:05:04.133Z [6DF01B90 info 'TaskManager' opID=QS-host-501-d09c6b96-89] Task Created : haTask-ha-host-vim.option.OptionManager.updateValues-279508941
                    2013-01-17T14:05:04.144Z [6D97AB90 info 'TaskManager' opID=QS-host-501-d09c6b96-89] Task Completed : haTask-ha-host-vim.option.OptionManager.updateValues-279508941 Status success
                    2013-01-17T14:05:04.189Z [FF9E4B90 info 'TaskManager' opID=QS-host-501-d09c6b96-89] Task Created : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-279508943
                    2013-01-17T14:05:04.354Z [6DBF7B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
                    2013-01-17T14:05:04.911Z [6DBF7B90 verbose 'Default'] StorageSystemVmkImplProvider: advanced option get key =  VMFS.UnresolvedVolumeLiveCheck
                    2013-01-17T14:05:04.942Z [6DF01B90 info 'ComputeResource' opID=QS-host-501-d09c6b96-89] Updated local swap datastore policy: false
                    2013-01-17T14:05:04.942Z [6DF01B90 info 'TaskManager' opID=QS-host-501-d09c6b96-89] Task Completed : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-279508943 Status success
                    2013-01-17T14:05:05.215Z [FF9E4B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
                    2013-01-17T14:05:05.776Z [FF9E4B90 verbose 'Default'] StorageSystemVmkImplProvider: advanced option get key =  VMFS.UnresolvedVolumeLiveCheck
                    2013-01-17T14:05:09.229Z [6D9BBB90 verbose 'SoapAdapter'] Responded to service state request
                    2013-01-17T14:05:11.889Z [6DBF7B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
                    2013-01-17T14:05:15.941Z [6DBB6B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8309, peer=127.0.0.1:60935)
                    2013-01-17T14:05:15.941Z [6DBB6B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:61273)
                    2013-01-17T14:05:15.941Z [6DC79B90 error 'HttpSvc.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
                    2013-01-17T14:05:15.942Z [6D9BBB90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
                    2013-01-17T14:05:20.031Z [6D97AB90 warning 'vim.PerformanceManager'] Calculated write I/O size 724992 for scsi0:0 is out of range -- 724992,prevBytes = 23061565440 curBytes = 23063015424 prevCommands = 650263curCommands = 650265
                    2013-01-17T14:05:25.943Z [6E4D5B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:57922)
                    2013-01-17T14:05:25.944Z [6DBF7B90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
                    2013-01-17T14:05:39.230Z [FFFA1B90 verbose 'SoapAdapter'] Responded to service state request
                    2013-01-17T14:05:44.566Z [6D97AB90 verbose 'Vmsvc'] RefreshVms updated overhead for 1 VM
                    2013-01-17T14:05:57.674Z [6DC79B90 verbose 'Locale' opID=2C3198E0-000003CA-5a-57] Default resource used for 'counter.sys.resourceMemConsumed.summary' expected in module 'perf'.
                    2013-01-17T14:06:09.232Z [6E4D5B90 verbose 'SoapAdapter'] Responded to service state request

                    • 8. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
                      Sreec Master
                      Community WarriorsvExpert

                      Hi ,

                           Please increase the VC logging level also do confirm below mentioned steps are configured.

                       

                      VC console:

                      1. Click Administration > vCenter Server Settings.
                      2. Click Runtime Settings.
                      3. Enter the IP address of the vCenter Server in the vCenter Server Managed IP field under the Managed IP Address heading.
                      4. Click OK.
                      5. Close the vSphere Client session connected to vCenter.
                      6. Open a new vSphere Client session with vCenter.
                      7. Connect the ESX host. 
                      • 9. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
                        zXi_Gamer Master

                        Can you check the vmkernel and hostd logs during this time "2013-01-17T13:08:XXXXXXX"

                        1 person found this helpful
                        • 10. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
                          einstein-a-go-go Hot Shot
                          vExpert

                          oh yes, forgot to increase logging level.

                           

                          already changed managed IP Address.

                           

                          increased to verbose.

                           

                          at time of disconnection before and when disconnects

                           

                          vmkernel.log

                           

                          2013-01-17T14:22:19.462Z cpu7:72726)VmMemMigrate: vm 72726: 5005: Regular swap file bitmap checks out.
                          2013-01-17T14:22:19.464Z cpu7:72726)VMotion: 5458: 1358432531942417 D: Resume handshake successful
                          2013-01-17T14:22:19.531Z cpu6:72891)Hbr: 3405: Migration end received (worldID=72726) (migrateType=1) (event=1) (isSource=0) (sharedConfig=1)
                          2013-01-17T14:22:19.551Z cpu1:72892)Swap: vm 72726: 3254: Starting prefault for the migration swap file
                          2013-01-17T14:22:19.563Z cpu1:72892)Swap: vm 72726: 3429: Finish swapping in migration swap file. (faulted 0 pages, pshared 0 pages). Success.
                          2013-01-17T14:22:19.713Z cpu2:7925)Config: 347: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)
                          2013-01-17T14:22:23.851Z cpu0:4104)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007d23c0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                          2013-01-17T14:22:23.851Z cpu0:4104)ScsiDeviceIO: 2316: Cmd(0x4124007d23c0) 0x1a, CmdSN 0x1f38 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                          2013-01-17T14:22:23.876Z cpu0:4104)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007d23c0, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                          2013-01-17T14:22:23.876Z cpu0:4104)ScsiDeviceIO: 2316: Cmd(0x4124007d23c0) 0x1a, CmdSN 0x1f3a from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                          2013-01-17T14:24:54.127Z cpu2:7914)Config: 347: "VMOverheadGrowthLimit" = -1, Old Value: 0, (Status: 0x0)
                          2013-01-17T14:26:05.551Z cpu5:5120)Config: 347: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
                          2013-01-17T14:26:06.331Z cpu7:7841)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
                          2013-01-17T14:27:23.867Z cpu4:7892)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x412400804800, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                          2013-01-17T14:27:23.867Z cpu4:7892)ScsiDeviceIO: 2316: Cmd(0x412400804800) 0x1a, CmdSN 0x1f77 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                          2013-01-17T14:27:23.890Z cpu1:4097)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x412400804800, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                          2013-01-17T14:27:23.890Z cpu1:4097)ScsiDeviceIO: 2316: Cmd(0x412400804800) 0x1a, CmdSN 0x1f79 from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                          2013-01-17T14:31:42.052Z cpu6:5167)Config: 347: "VMOverheadGrowthLimit" = -1, Old Value: 0, (Status: 0x0)
                          2013-01-17T14:32:23.880Z cpu0:4104)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007ed380, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                          2013-01-17T14:32:23.880Z cpu0:4104)ScsiDeviceIO: 2316: Cmd(0x4124007ed380) 0x1a, CmdSN 0x1fd4 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
                          2013-01-17T14:32:23.907Z cpu4:4100)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007ed380, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
                          2013-01-17T14:32:23.907Z cpu4:4100)ScsiDeviceIO: 2316: Cmd(0x4124007ed380) 0x1a, CmdSN 0x1fd6 from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

                           

                          hostd.log

                           

                          2013-01-17T14:33:42.072Z [6AEC8B90 warning 'PropertyCollector'] ComputeGUReq took 1958499 microSec
                          2013-01-17T14:33:43.182Z [6AA2AB90 verbose 'Default' opID=015AD5D3-0000029E-73] CloseSession called for session id=0ef5e3f6-7424-d0b0-fda8-04104e160e35
                          2013-01-17T14:33:43.183Z [6AA2AB90 info 'ha-eventmgr' opID=015AD5D3-0000029E-73] Event 293 : User vpxuser@140.247.49.144 logged out (login time: Friday, 02 January, 1970 01:15:55, number of API invocations: 0, user agent: VMware-client/5.1.0)
                          2013-01-17T14:33:53.696Z [6AA2AB90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T14:34:03.715Z [FF962D20 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8309, peer=127.0.0.1:52321)
                          2013-01-17T14:34:03.715Z [FF962D20 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:59344)
                          2013-01-17T14:34:03.715Z [6AB2EB90 error 'HttpSvc.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
                          2013-01-17T14:34:03.716Z [6AE86B90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
                          2013-01-17T14:34:04.502Z [6AE05B90 verbose 'Locale' opID=QS-host-537-c9d462e1-49] Default resource used for 'counter.sys.resourceMemConsumed.summary' expected in module 'perf'.
                          2013-01-17T14:34:04.540Z [6AD84B90 verbose 'ha-license-manager' opID=QS-host-537-c9d462e1-49] Load: Loading existing file: /etc/vmware/license.cfg
                          2013-01-17T14:34:04.558Z [6AD84B90 verbose 'Default' opID=QS-host-537-c9d462e1-49] ha-license-manager:Validate -> Valid license detected for "VMware ESX Server 5.0" (lastError=0, desc.IsValid:Yes)
                          2013-01-17T14:34:04.568Z [6AD84B90 error 'Default' opID=QS-host-537-c9d462e1-49] Unable to parse MaxRam value:
                          2013-01-17T14:34:04.568Z [6AD84B90 error 'Default' opID=QS-host-537-c9d462e1-49] Unable to parse MaxRamPerCpu value:
                          2013-01-17T14:34:04.569Z [6AD84B90 error 'Default' opID=QS-host-537-c9d462e1-49] Unable to parse MinRamPerCpu value:
                          2013-01-17T14:34:04.569Z [6AD84B90 error 'Default' opID=QS-host-537-c9d462e1-49] Unable to parse vram value:
                          2013-01-17T14:34:05.871Z [6AAEDB90 info 'TaskManager' opID=QS-host-537-c9d462e1-e1] Task Created : haTask-ha-host-vim.option.OptionManager.updateValues-514232047
                          2013-01-17T14:34:05.882Z [6AAEDB90 info 'TaskManager' opID=QS-host-537-c9d462e1-e1] Task Completed : haTask-ha-host-vim.option.OptionManager.updateValues-514232047 Status success
                          2013-01-17T14:34:05.915Z [FF962D20 info 'TaskManager' opID=QS-host-537-c9d462e1-e1] Task Created : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-514232049
                          2013-01-17T14:34:06.084Z [6AD84B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
                          2013-01-17T14:34:06.643Z [6AD84B90 verbose 'Default'] StorageSystemVmkImplProvider: advanced option get key =  VMFS.UnresolvedVolumeLiveCheck
                          2013-01-17T14:34:06.673Z [6AA2AB90 info 'ComputeResource' opID=QS-host-537-c9d462e1-e1] Updated local swap datastore policy: false
                          2013-01-17T14:34:06.673Z [6AA2AB90 info 'TaskManager' opID=QS-host-537-c9d462e1-e1] Task Completed : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-514232049 Status success
                          2013-01-17T14:34:06.944Z [6A9C9B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
                          2013-01-17T14:34:07.504Z [6A9C9B90 verbose 'Default'] StorageSystemVmkImplProvider: advanced option get key =  VMFS.UnresolvedVolumeLiveCheck
                          2013-01-17T14:34:08.717Z [FF962D20 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:54793)
                          2013-01-17T14:34:08.718Z [6AA2AB90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
                          2013-01-17T14:34:18.371Z [6AE05B90 info 'SysCommandPosix' opID=015AD5D3-000002E0-46-14] ForkExec(/usr/bin/sh)  74694
                          2013-01-17T14:34:23.698Z [6AD84B90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T14:34:24.996Z [6AAEDB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
                          2013-01-17T14:34:40.114Z [FF962D20 verbose 'Locale' opID=015AD5D3-0000029E-73-b5] Default resource used for 'counter.sys.resourceMemConsumed.summary' expected in module 'perf'.
                          2013-01-17T14:34:53.702Z [6AAEDB90 verbose 'SoapAdapter'] Responded to service state request

                           

                          the login time looks odd 2 Jan 1970! but time is correct omn the server.

                           

                           

                          between that timeframe I have a gap!

                           

                          2013-01-17T13:07:46.917Z cpu5:7914)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
                          2013-01-17T13:12:23.641Z cpu5:4101)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x412400769600, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

                           

                          hostd.log

                           

                          2013-01-17T13:08:04.709Z [FF962D20 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:08:15.918Z [6AE86B90 info 'SysCommandPosix' opID=D763F63B-000014C7] ForkExec(/usr/bin/sh)  66736
                          2013-01-17T13:08:28.009Z [6AE05B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:49557)
                          2013-01-17T13:08:28.009Z [6AE05B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8309, peer=127.0.0.1:56426)
                          2013-01-17T13:08:28.009Z [6AD84B90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
                          2013-01-17T13:08:28.009Z [6AEC8B90 error 'HttpSvc.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
                          2013-01-17T13:08:28.830Z [6AA6BB90 info 'Vmomi' opID=D763F63B-00001487] Activation [N5Vmomi10ActivationE:0xffc5d538] : Invoke done [waitForUpdates] on [vmodl.query.PropertyCollector:session[671eee21-561b-acbf-d723-6ebf99980032]52f70510-8
                          2013-01-17T13:08:28.830Z [6AA6BB90 verbose 'Vmomi' opID=D763F63B-00001487] Arg version:
                          --> "142"
                          2013-01-17T13:08:28.831Z [6AA6BB90 info 'Vmomi' opID=D763F63B-00001487] Throw vmodl.fault.RequestCanceled
                          2013-01-17T13:08:28.831Z [6AA6BB90 info 'Vmomi' opID=D763F63B-00001487] Result:
                          --> (vmodl.fault.RequestCanceled) {
                          -->    dynamicType = <unset>,
                          -->    faultCause = (vmodl.MethodFault) null,
                          -->    msg = "",
                          --> }
                          2013-01-17T13:08:28.831Z [6A9C9B90 error 'SoapAdapter.HTTPService'] HTTP Transaction failed on stream TCP(error:Transport endpoint is not connected) with error N7Vmacore15SystemExceptionE(Connection reset by peer)
                          2013-01-17T13:08:34.710Z [6AE86B90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:08:42.885Z [6AAEDB90 error 'EnvironmentManager'] Unable to obtain the default HW version
                          2013-01-17T13:08:42.899Z [6AAACB90 verbose 'DvsTracker'] FetchSwitches: added 0 items
                          2013-01-17T13:08:42.899Z [6AAACB90 verbose 'DvsTracker'] FetchDVPortgroups: added 0 items
                          2013-01-17T13:08:42.899Z [6AAACB90 info 'DvsTracker'] FetchUplinkDVPortgroups: added 0 items
                          2013-01-17T13:09:03.845Z [6AB2EB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
                          2013-01-17T13:09:04.711Z [6AE05B90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:09:05.387Z [6AD84B90 verbose 'ResourcePool ha-root-pool'] Root pool capacity changed from 14423MHz/13571MB to 14423MHz/13572MB
                          2013-01-17T13:09:34.713Z [6AAEDB90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:10:02.437Z [6AEC8B90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:10:02.540Z [6AAACB90 info 'Vmomi'] Activation [N5Vmomi10ActivationE:0xffccc458] : Invoke done [waitForUpdates] on [vmodl.query.PropertyCollector:ha-property-collector]
                          2013-01-17T13:10:02.540Z [6AAACB90 verbose 'Vmomi'] Arg version:
                          --> "38"
                          2013-01-17T13:10:02.540Z [6AAACB90 info 'Vmomi'] Throw vmodl.fault.RequestCanceled
                          2013-01-17T13:10:02.540Z [6AAACB90 info 'Vmomi'] Result:
                          --> (vmodl.fault.RequestCanceled) {
                          -->    dynamicType = <unset>,
                          -->    faultCause = (vmodl.MethodFault) null,
                          -->    msg = "",
                          --> }
                          2013-01-17T13:10:02.541Z [6AB2EB90 error 'SoapAdapter.HTTPService'] HTTP Transaction failed on stream TCP(error:Transport endpoint is not connected) with error N7Vmacore15SystemExceptionE(Connection reset by peer)
                          2013-01-17T13:10:04.714Z [6AAACB90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:10:05.388Z [6AEC8B90 verbose 'ResourcePool ha-root-pool'] Root pool capacity changed from 14423MHz/13572MB to 14423MHz/13571MB
                          2013-01-17T13:10:11.702Z [6A9C9B90 verbose 'Vmsvc'] RefreshVms updated overhead for 1 VM
                          2013-01-17T13:10:34.715Z [6AB2EB90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:10:35.290Z [FF962D20 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
                          2013-01-17T13:11:04.717Z [FF962D20 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:11:05.390Z [6AE86B90 verbose 'ResourcePool ha-root-pool'] Root pool capacity changed from 14423MHz/13571MB to 14423MHz/13572MB
                          2013-01-17T13:11:34.718Z [6AB2EB90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:12:04.719Z [6AAEDB90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:12:06.734Z [6AD84B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
                          2013-01-17T13:12:09.398Z [6AD84B90 verbose 'DvsManager'] PersistAllDvsInfo called
                          2013-01-17T13:12:34.721Z [6AE05B90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:13:04.722Z [6AA2AB90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:13:11.706Z [6AAACB90 verbose 'Vmsvc'] RefreshVms updated overhead for 1 VM
                          2013-01-17T13:13:34.723Z [6AD84B90 verbose 'SoapAdapter'] Responded to service state request
                          2013-01-17T13:13:38.178Z [6AA6BB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
                          2013-01-17T13:13:42.936Z [6A9C9B90 error 'EnvironmentManager'] Unable to obtain the default HW version

                          • 12. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
                            einstein-a-go-go Hot Shot
                            vExpert

                            I'll follow, just like to add

                             

                            it's been rock solid for 6 months, recent changes

                             

                            1. addition of VMware Data Protection appliance.(currently now turn off)

                            2. integration with Microsoft SCVMM 2012 SP1 Beta (currently now turn off)

                             

                            and still doing it every 45 seconds.

                             

                            Now it's possible somebody has applied a lockdown firewall Group Policy to the OU!

                             

                            that's a thought checking....

                            • 13. Re: hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)
                              memaad Master

                              Hi,

                               

                              As from logs, I see no heartbeat messages , apply this in vCenter server

                               

                              To increase the timeout limit:
                              1. Open the C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg file (on Windows 2008, C:\ProgramData\VMware\VMware VirtualCenter\vpxd.cfg) using a text editor. For more information on the location of the vpxd.cfg file, seeCommon vCenter Server and vSphere Client Windows paths (1028185).

                              2. Add this information in the <vpxd> tags:

                                <heartbeat>
                                <notRespondingTimeout>180</notRespondingTimeout>
                                </heartbeat>


                              3. Restart the VMware VirtualCenter Server service.

                               

                              Lets monitor after making this changes. Also would like to know if this is only host that is affected.

                               

                              Regards

                              Mohammed

                              1 2 Previous Next