VMware Cloud Community
einstein-a-go-g
Hot Shot
Hot Shot
Jump to solution

hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding after 45 seconds (always)

I've got a very odd installation, it's been stable for 6 months, I'm not sure if this is conciendence after implementing VDP, but now after 45 seconds, the ESXi 5.1.0 Build 799733 hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding.

I can right click each host, and issue a Connect and the Servers are re-connected, and then after 45 seconds, appear as not responding again.

The vCenter is a virtual machine, on the cluster of two Hosts, using VMXNET3 interface, fully patched.

from the vCenter server at the time of the disconnects or not responding, I'm not seeing any ping timeouts to the hosts, from vCenter VM to Hosts, I can also connect directly to the ESXi servers, with vSphere Client, and no issues there, I remain connected.

DNS is okay, Default Gateway is okay, database being used is default install with vCenter (simple install), SQL Express, database is not full or reached it's limit.

nothing in Windows Event Log...

things tried

1. Restarted ESXi hosts

2. Restart Network Management agents

3. Upgraded vCenter Server to Build Build 947673

4. restarted vCenter Server

5. Shutdown all VMs, except vCenter Server including VDP.

not done hosts yet!

in the vpxd logs I can see the following:- so I can see I think the disconnection, but no idea as to why!

2013-01-17T13:08:37.741Z [06164 info 'vpxdvpxdVmomi' opID=SWI-6e140ec8] [ClientAdapterBase::InvokeOnSoap] Invoke done (esx003.domain.ac.uk, vpxapi.VpxaService.fetchQuickStats)
2013-01-17T13:08:37.741Z [06164 info 'vpxdvpxdVmomi' opID=SWI-6e140ec8] [ClientAdapterBase::InvokeOnSoap] Invoke done (esx004.domain.ac.uk, vpxapi.VpxaService.fetchQuickStats)
2013-01-17T13:08:39.659Z [06292 info 'vpxdvpxdHostCnx' opID=SWI-7a1cb616] [VpxdHostCnx] No heartbeats received from host 5203136b-c37d-c238-69c3-7101151dae9b within 4398798518000 ms
2013-01-17T13:08:39.659Z [06292 info 'vpxdvpxdHostCnx' opID=SWI-7a1cb616] [VpxdHostCnx] No heartbeats received from host 5274675a-2fb6-a9ae-7787-9f24c69ff6e5 within 4398798518000 ms
2013-01-17T13:08:39.659Z [06620 info 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost] Got lost connection callback for host-28
2013-01-17T13:08:39.659Z [04264 info 'commonvpxLro'] [VpxLRO] -- BEGIN task-internal-212 -- host-28 -- VpxdInvtHostSyncHostLRO.Synchronize --
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-28
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost::FixNotRespondingHost] Returning false since host is already fixed!
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Failed to fix not responding host host-28
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-28
2013-01-17T13:08:39.659Z [04264 error 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] FixNotRespondingHost failed for host host-28, marking host as notResponding
2013-01-17T13:08:39.659Z [06620 info 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost] Got lost connection callback for host-24
2013-01-17T13:08:39.659Z [06620 info 'commonvpxLro'] [VpxLRO] -- BEGIN task-internal-213 -- host-24 -- VpxdInvtHostSyncHostLRO.Synchronize --
2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-24
2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost::FixNotRespondingHost] Returning false since host is already fixed!
2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Failed to fix not responding host host-24
2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-24
2013-01-17T13:08:39.659Z [06620 error 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] FixNotRespondingHost failed for host host-24, marking host as notResponding
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdMoHost'] [HostMo] host connection state changed to [NO_RESPONSE] for host-28
2013-01-17T13:08:39.784Z [04264 info 'vpxdvpxdMoHost'] [HostMo::SetComputeCompatibilityDirty] Marked host-28 as dirty.
2013-01-17T13:08:39.784Z [04264 info 'clustervpxdMoCluster'] [ClusterMo::SetDasCompatDirty] Marked domain-c37 as dirty.
2013-01-17T13:08:39.800Z [04264 info 'commonvpxLro'] [VpxLRO] -- FINISH task-internal-212 -- host-28 -- VpxdInvtHostSyncHostLRO.Synchronize --
2013-01-17T13:08:39.831Z [06620 warning 'vpxdvpxdMoHost'] [HostMo] host connection state changed to [NO_RESPONSE] for host-24
2013-01-17T13:08:39.987Z [06620 info 'vpxdvpxdMoHost'] [HostMo::SetComputeCompatibilityDirty] Marked host-24 as dirty.
2013-01-17T13:08:39.987Z [06620 info 'clustervpxdMoCluster'] [ClusterMo::SetDasCompatDirty] Marked domain-c37 as dirty.
2013-01-17T13:08:40.003Z [06620 info 'commonvpxLro'] [VpxLRO] -- FINISH task-internal-213 -- host-24 -- VpxdInvtHostSyncHostLRO.Synchronize --

any ideas?

Reply
0 Kudos
1 Solution

Accepted Solutions
Sreec
VMware Employee
VMware Employee
Jump to solution

Hi ,

    Looks like a port level issue.Can you please make KB:http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&e... is followed properly.

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered

View solution in original post

Reply
0 Kudos
16 Replies
zXi_Gamer
Virtuoso
Virtuoso
Jump to solution

Well one clue here is that if it is happening exactly after 45 secs, then it must be some agent which is threading at regular intervals. Also, since there are no timeouts, one possible reason might be the vpxa[VC] recognizes that the hosts are not meeting some requirement and then disconnects them.

do check on the hosts vmkernel log file before the disconnect to confirm any information. In your cluster settings, can you do a quick "apply recommendation" and find if any requirement is not met, like FDM or HA.

Reply
0 Kudos
aravinds3107
Virtuoso
Virtuoso
Jump to solution

Check if KB: 2019865 is applicable for your setup

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful |Blog: http://aravindsivaraman.com/ | Twitter : ss_aravind
Reply
0 Kudos
einstein-a-go-g
Hot Shot
Hot Shot
Jump to solution

Standard LAN, no NAT or firewalls.

also tried remvoing hosts from vCenter, and re-adding.

checking managed IP Address

checking managed vcenter fqdn

hosts can ping vCenter by fqdn

hosts can ping vCenter by IP Address

Reply
0 Kudos
Sreec
VMware Employee
VMware Employee
Jump to solution

Hi ,

     From the logs i can see 2013-01-17T13:08:39.784Z [04264 info 'vpxdvpxdMoHost'] [HostMo::SetComputeCompatibilityDirty] Marked host-28 as dirty.

Hope that is the correct host?

1.If network connectivity is fine btwn ESX&VC,i would suggest you to go ahead and reinstall the vpxa agent (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=103191...)

Note:When you connect the host back,the agent will be automatically pushed Smiley Happy

2.If the issue still persist,please do post the vpxd logs and hostd logs

Note:If possible increase the VC logging level to verbose/trivia prior to log capture

3.Do you have NAT configured btwn Host/VC? If so,its unsupported configuration.However there is a small

tweaking that we can do make it work!!!

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
zXi_Gamer
Virtuoso
Virtuoso
Jump to solution

Hmmm.. one particular message

2013-01-17T13:08:39.659Z [06292 info 'vpxdvpxdHostCnx'  opID=SWI-7a1cb616] [VpxdHostCnx] No heartbeats received from host  5203136b-c37d-c238-69c3-7101151dae9b within 4398798518000 ms

If there are no heartbeats for 4398798518000ms, i.e 7 mins if I am correct, then can you confirm if the hostd is working in the servers. Do check on var/log/hostd.log and check if the hostd crashes with any information..

I can also suggest to raise a support request with VMware if this is affecting production environment

einstein-a-go-g
Hot Shot
Hot Shot
Jump to solution

very little happending in vmkernel.log at time of disconnect

2013-01-17T13:41:03.815Z cpu5:5120)Config: 347: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
2013-01-17T13:41:04.613Z cpu6:7914)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
2013-01-17T13:42:23.730Z cpu0:4104)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007d33c0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T13:42:23.730Z cpu0:4104)ScsiDeviceIO: 2316: Cmd(0x4124007d33c0) 0x1a, CmdSN 0x1e36 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T13:42:23.754Z cpu5:4101)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007d33c0, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T13:42:23.754Z cpu5:4101)ScsiDeviceIO: 2316: Cmd(0x4124007d33c0) 0x1a, CmdSN 0x1e38 from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T13:42:24.389Z cpu1:7841)Config: 347: "VMOverheadGrowthLimit" = -1, Old Value: 0, (Status: 0x0)
2013-01-17T13:43:03.680Z cpu7:5167)Config: 347: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
2013-01-17T13:43:04.475Z cpu1:7841)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
2013-01-17T13:44:21.227Z cpu4:5123)Config: 347: "VMOverheadGrowthLimit" = -1, Old Value: 0, (Status: 0x0)
2013-01-17T13:45:04.135Z cpu2:7914)Config: 347: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
2013-01-17T13:45:04.927Z cpu7:7925)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
2013-01-17T13:47:23.746Z cpu2:4098)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007c3d00, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T13:47:23.746Z cpu2:4098)ScsiDeviceIO: 2316: Cmd(0x4124007c3d00) 0x1a, CmdSN 0x1e5b from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T13:47:23.771Z cpu4:4100)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007c3d00, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T13:47:23.771Z cpu4:4100)ScsiDeviceIO: 2316: Cmd(0x4124007c3d00) 0x1a, CmdSN 0x1e5d from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T13:52:08.507Z cpu4:4100)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x85 (0x4124007a3240, 5214) to dev "naa.6001e4f01d7de30017a27f5a7af7e639" on path "vmhba1:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T13:52:08.507Z cpu4:4100)ScsiDeviceIO: 2316: Cmd(0x4124007a3240) 0x85, CmdSN 0x93 from world 5214 to dev "naa.6001e4f01d7de30017a27f5a7af7e639" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T13:52:08.507Z cpu4:4475)ScsiDeviceIO: 2316: Cmd(0x4124007a3240) 0x4d, CmdSN 0x94 from world 5214 to dev "naa.6001e4f01d7de30017a27f5a7af7e639" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T13:52:08.507Z cpu4:4475)ScsiDeviceIO: 2316: Cmd(0x4124007a3240) 0x1a, CmdSN 0x95 from world 5214 to dev "naa.6001e4f01d7de30017a27f5a7af7e639" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2013-01-17T13:52:23.761Z cpu5:4101)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007c4000, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T13:52:23.761Z cpu5:4101)ScsiDeviceIO: 2316: Cmd(0x4124007c4000) 0x1a, CmdSN 0x1e69 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T13:52:23.786Z cpu5:4101)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007c4000, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T13:52:23.786Z cpu5:4101)ScsiDeviceIO: 2316: Cmd(0x4124007c4000) 0x1a, CmdSN 0x1e6b from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

those errors are local disk, two SAS disks as RAID1 which are not used, and those errors could have always been there.

at time of disconnect nothing

Reply
0 Kudos
einstein-a-go-g
Hot Shot
Hot Shot
Jump to solution

no NAT.

checking hostd.logs

this is the log tail at time of disconnect,

2013-01-17T14:05:02.763Z [6E4D5B90 verbose 'Locale' opID=QS-host-501-d09c6b96-a8] Default resource used for 'counter.sys.resourceMemConsumed.summary' expected in module 'perf'.
2013-01-17T14:05:02.802Z [6E4D5B90 verbose 'ha-license-manager' opID=QS-host-501-d09c6b96-a8] Load: Loading existing file: /etc/vmware/license.cfg
2013-01-17T14:05:02.819Z [6E4D5B90 verbose 'Default' opID=QS-host-501-d09c6b96-a8] ha-license-manager:Validate -> Valid license detected for "VMware ESX Server 5.0" (lastError=0, desc.IsValid:Yes)
2013-01-17T14:05:02.830Z [6E4D5B90 error 'Default' opID=QS-host-501-d09c6b96-a8] Unable to parse MaxRam value:
2013-01-17T14:05:02.830Z [6E4D5B90 error 'Default' opID=QS-host-501-d09c6b96-a8] Unable to parse MaxRamPerCpu value:
2013-01-17T14:05:02.830Z [6E4D5B90 error 'Default' opID=QS-host-501-d09c6b96-a8] Unable to parse MinRamPerCpu value:
2013-01-17T14:05:02.830Z [6E4D5B90 error 'Default' opID=QS-host-501-d09c6b96-a8] Unable to parse vram value:
2013-01-17T14:05:04.133Z [6DF01B90 info 'TaskManager' opID=QS-host-501-d09c6b96-89] Task Created : haTask-ha-host-vim.option.OptionManager.updateValues-279508941
2013-01-17T14:05:04.144Z [6D97AB90 info 'TaskManager' opID=QS-host-501-d09c6b96-89] Task Completed : haTask-ha-host-vim.option.OptionManager.updateValues-279508941 Status success
2013-01-17T14:05:04.189Z [FF9E4B90 info 'TaskManager' opID=QS-host-501-d09c6b96-89] Task Created : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-279508943
2013-01-17T14:05:04.354Z [6DBF7B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
2013-01-17T14:05:04.911Z [6DBF7B90 verbose 'Default'] StorageSystemVmkImplProvider: advanced option get key =  VMFS.UnresolvedVolumeLiveCheck
2013-01-17T14:05:04.942Z [6DF01B90 info 'ComputeResource' opID=QS-host-501-d09c6b96-89] Updated local swap datastore policy: false
2013-01-17T14:05:04.942Z [6DF01B90 info 'TaskManager' opID=QS-host-501-d09c6b96-89] Task Completed : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-279508943 Status success
2013-01-17T14:05:05.215Z [FF9E4B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
2013-01-17T14:05:05.776Z [FF9E4B90 verbose 'Default'] StorageSystemVmkImplProvider: advanced option get key =  VMFS.UnresolvedVolumeLiveCheck
2013-01-17T14:05:09.229Z [6D9BBB90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T14:05:11.889Z [6DBF7B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2013-01-17T14:05:15.941Z [6DBB6B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8309, peer=127.0.0.1:60935)
2013-01-17T14:05:15.941Z [6DBB6B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:61273)
2013-01-17T14:05:15.941Z [6DC79B90 error 'HttpSvc.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2013-01-17T14:05:15.942Z [6D9BBB90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2013-01-17T14:05:20.031Z [6D97AB90 warning 'vim.PerformanceManager'] Calculated write I/O size 724992 for scsi0:0 is out of range -- 724992,prevBytes = 23061565440 curBytes = 23063015424 prevCommands = 650263curCommands = 650265
2013-01-17T14:05:25.943Z [6E4D5B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:57922)
2013-01-17T14:05:25.944Z [6DBF7B90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2013-01-17T14:05:39.230Z [FFFA1B90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T14:05:44.566Z [6D97AB90 verbose 'Vmsvc'] RefreshVms updated overhead for 1 VM
2013-01-17T14:05:57.674Z [6DC79B90 verbose 'Locale' opID=2C3198E0-000003CA-5a-57] Default resource used for 'counter.sys.resourceMemConsumed.summary' expected in module 'perf'.
2013-01-17T14:06:09.232Z [6E4D5B90 verbose 'SoapAdapter'] Responded to service state request

Reply
0 Kudos
Sreec
VMware Employee
VMware Employee
Jump to solution

Hi ,

     Please increase the VC logging level also do confirm below mentioned steps are configured.

VC console:

  1. Click Administration > vCenter Server Settings.
  2. Click Runtime Settings.
  3. Enter the IP address of the vCenter Server in the vCenter Server Managed IP field under the Managed IP Address heading.
  4. Click OK.
  5. Close the vSphere Client session connected to vCenter.
  6. Open a new vSphere Client session with vCenter.
  7. Connect the ESX host. 
Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
zXi_Gamer
Virtuoso
Virtuoso
Jump to solution

Can you check the vmkernel and hostd logs during this time "2013-01-17T13:08:XXXXXXX"

einstein-a-go-g
Hot Shot
Hot Shot
Jump to solution

oh yes, forgot to increase logging level.

already changed managed IP Address.

increased to verbose.

at time of disconnection before and when disconnects

vmkernel.log

2013-01-17T14:22:19.462Z cpu7:72726)VmMemMigrate: vm 72726: 5005: Regular swap file bitmap checks out.
2013-01-17T14:22:19.464Z cpu7:72726)VMotion: 5458: 1358432531942417 😧 Resume handshake successful
2013-01-17T14:22:19.531Z cpu6:72891)Hbr: 3405: Migration end received (worldID=72726) (migrateType=1) (event=1) (isSource=0) (sharedConfig=1)
2013-01-17T14:22:19.551Z cpu1:72892)Swap: vm 72726: 3254: Starting prefault for the migration swap file
2013-01-17T14:22:19.563Z cpu1:72892)Swap: vm 72726: 3429: Finish swapping in migration swap file. (faulted 0 pages, pshared 0 pages). Success.
2013-01-17T14:22:19.713Z cpu2:7925)Config: 347: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)
2013-01-17T14:22:23.851Z cpu0:4104)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007d23c0, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T14:22:23.851Z cpu0:4104)ScsiDeviceIO: 2316: Cmd(0x4124007d23c0) 0x1a, CmdSN 0x1f38 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T14:22:23.876Z cpu0:4104)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007d23c0, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T14:22:23.876Z cpu0:4104)ScsiDeviceIO: 2316: Cmd(0x4124007d23c0) 0x1a, CmdSN 0x1f3a from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T14:24:54.127Z cpu2:7914)Config: 347: "VMOverheadGrowthLimit" = -1, Old Value: 0, (Status: 0x0)
2013-01-17T14:26:05.551Z cpu5:5120)Config: 347: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
2013-01-17T14:26:06.331Z cpu7:7841)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
2013-01-17T14:27:23.867Z cpu4:7892)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x412400804800, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T14:27:23.867Z cpu4:7892)ScsiDeviceIO: 2316: Cmd(0x412400804800) 0x1a, CmdSN 0x1f77 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T14:27:23.890Z cpu1:4097)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x412400804800, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T14:27:23.890Z cpu1:4097)ScsiDeviceIO: 2316: Cmd(0x412400804800) 0x1a, CmdSN 0x1f79 from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T14:31:42.052Z cpu6:5167)Config: 347: "VMOverheadGrowthLimit" = -1, Old Value: 0, (Status: 0x0)
2013-01-17T14:32:23.880Z cpu0:4104)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007ed380, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T14:32:23.880Z cpu0:4104)ScsiDeviceIO: 2316: Cmd(0x4124007ed380) 0x1a, CmdSN 0x1fd4 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2013-01-17T14:32:23.907Z cpu4:4100)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x4124007ed380, 0) to dev "t10.DP______BACKPLANE000000" on path "vmhba1:C0:T8:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2013-01-17T14:32:23.907Z cpu4:4100)ScsiDeviceIO: 2316: Cmd(0x4124007ed380) 0x1a, CmdSN 0x1fd6 from world 0 to dev "t10.DP______BACKPLANE000000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

hostd.log

2013-01-17T14:33:42.072Z [6AEC8B90 warning 'PropertyCollector'] ComputeGUReq took 1958499 microSec
2013-01-17T14:33:43.182Z [6AA2AB90 verbose 'Default' opID=015AD5D3-0000029E-73] CloseSession called for session id=0ef5e3f6-7424-d0b0-fda8-04104e160e35
2013-01-17T14:33:43.183Z [6AA2AB90 info 'ha-eventmgr' opID=015AD5D3-0000029E-73] Event 293 : User vpxuser@140.247.49.144 logged out (login time: Friday, 02 January, 1970 01:15:55, number of API invocations: 0, user agent: VMware-client/5.1.0)
2013-01-17T14:33:53.696Z [6AA2AB90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T14:34:03.715Z [FF962D20 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8309, peer=127.0.0.1:52321)
2013-01-17T14:34:03.715Z [FF962D20 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:59344)
2013-01-17T14:34:03.715Z [6AB2EB90 error 'HttpSvc.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2013-01-17T14:34:03.716Z [6AE86B90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2013-01-17T14:34:04.502Z [6AE05B90 verbose 'Locale' opID=QS-host-537-c9d462e1-49] Default resource used for 'counter.sys.resourceMemConsumed.summary' expected in module 'perf'.
2013-01-17T14:34:04.540Z [6AD84B90 verbose 'ha-license-manager' opID=QS-host-537-c9d462e1-49] Load: Loading existing file: /etc/vmware/license.cfg
2013-01-17T14:34:04.558Z [6AD84B90 verbose 'Default' opID=QS-host-537-c9d462e1-49] ha-license-manager:Validate -> Valid license detected for "VMware ESX Server 5.0" (lastError=0, desc.IsValid:Yes)
2013-01-17T14:34:04.568Z [6AD84B90 error 'Default' opID=QS-host-537-c9d462e1-49] Unable to parse MaxRam value:
2013-01-17T14:34:04.568Z [6AD84B90 error 'Default' opID=QS-host-537-c9d462e1-49] Unable to parse MaxRamPerCpu value:
2013-01-17T14:34:04.569Z [6AD84B90 error 'Default' opID=QS-host-537-c9d462e1-49] Unable to parse MinRamPerCpu value:
2013-01-17T14:34:04.569Z [6AD84B90 error 'Default' opID=QS-host-537-c9d462e1-49] Unable to parse vram value:
2013-01-17T14:34:05.871Z [6AAEDB90 info 'TaskManager' opID=QS-host-537-c9d462e1-e1] Task Created : haTask-ha-host-vim.option.OptionManager.updateValues-514232047
2013-01-17T14:34:05.882Z [6AAEDB90 info 'TaskManager' opID=QS-host-537-c9d462e1-e1] Task Completed : haTask-ha-host-vim.option.OptionManager.updateValues-514232047 Status success
2013-01-17T14:34:05.915Z [FF962D20 info 'TaskManager' opID=QS-host-537-c9d462e1-e1] Task Created : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-514232049
2013-01-17T14:34:06.084Z [6AD84B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
2013-01-17T14:34:06.643Z [6AD84B90 verbose 'Default'] StorageSystemVmkImplProvider: advanced option get key =  VMFS.UnresolvedVolumeLiveCheck
2013-01-17T14:34:06.673Z [6AA2AB90 info 'ComputeResource' opID=QS-host-537-c9d462e1-e1] Updated local swap datastore policy: false
2013-01-17T14:34:06.673Z [6AA2AB90 info 'TaskManager' opID=QS-host-537-c9d462e1-e1] Task Completed : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-514232049 Status success
2013-01-17T14:34:06.944Z [6A9C9B90 verbose 'NetConfigProvider'] FetchFn: List of pnics opted out
2013-01-17T14:34:07.504Z [6A9C9B90 verbose 'Default'] StorageSystemVmkImplProvider: advanced option get key =  VMFS.UnresolvedVolumeLiveCheck
2013-01-17T14:34:08.717Z [FF962D20 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:54793)
2013-01-17T14:34:08.718Z [6AA2AB90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2013-01-17T14:34:18.371Z [6AE05B90 info 'SysCommandPosix' opID=015AD5D3-000002E0-46-14] ForkExec(/usr/bin/sh)  74694
2013-01-17T14:34:23.698Z [6AD84B90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T14:34:24.996Z [6AAEDB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2013-01-17T14:34:40.114Z [FF962D20 verbose 'Locale' opID=015AD5D3-0000029E-73-b5] Default resource used for 'counter.sys.resourceMemConsumed.summary' expected in module 'perf'.
2013-01-17T14:34:53.702Z [6AAEDB90 verbose 'SoapAdapter'] Responded to service state request

the login time looks odd 2 Jan 1970! but time is correct omn the server.

between that timeframe I have a gap!

2013-01-17T13:07:46.917Z cpu5:7914)Config: 347: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)
2013-01-17T13:12:23.641Z cpu5:4101)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x1a (0x412400769600, 0) to dev "mpx.vmhba0:C0:T0:L0" on path "vmhba0:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

hostd.log

2013-01-17T13:08:04.709Z [FF962D20 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:08:15.918Z [6AE86B90 info 'SysCommandPosix' opID=D763F63B-000014C7] ForkExec(/usr/bin/sh)  66736
2013-01-17T13:08:28.009Z [6AE05B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8307, peer=127.0.0.1:49557)
2013-01-17T13:08:28.009Z [6AE05B90 verbose 'Default'] Timed out reading between HTTP requests. : Read timeout after approximately 50000ms. Closing stream TCP(local=127.0.0.1:8309, peer=127.0.0.1:56426)
2013-01-17T13:08:28.009Z [6AD84B90 error 'SoapAdapter.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2013-01-17T13:08:28.009Z [6AEC8B90 error 'HttpSvc.HTTPService'] Failed to read request; stream: TCP(<null>), error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2013-01-17T13:08:28.830Z [6AA6BB90 info 'Vmomi' opID=D763F63B-00001487] Activation [N5Vmomi10ActivationE:0xffc5d538] : Invoke done [waitForUpdates] on [vmodl.query.PropertyCollector:session[671eee21-561b-acbf-d723-6ebf99980032]52f70510-8
2013-01-17T13:08:28.830Z [6AA6BB90 verbose 'Vmomi' opID=D763F63B-00001487] Arg version:
--> "142"
2013-01-17T13:08:28.831Z [6AA6BB90 info 'Vmomi' opID=D763F63B-00001487] Throw vmodl.fault.RequestCanceled
2013-01-17T13:08:28.831Z [6AA6BB90 info 'Vmomi' opID=D763F63B-00001487] Result:
--> (vmodl.fault.RequestCanceled) {
-->    dynamicType = <unset>,
-->    faultCause = (vmodl.MethodFault) null,
-->    msg = "",
--> }
2013-01-17T13:08:28.831Z [6A9C9B90 error 'SoapAdapter.HTTPService'] HTTP Transaction failed on stream TCP(error:Transport endpoint is not connected) with error N7Vmacore15SystemExceptionE(Connection reset by peer)
2013-01-17T13:08:34.710Z [6AE86B90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:08:42.885Z [6AAEDB90 error 'EnvironmentManager'] Unable to obtain the default HW version
2013-01-17T13:08:42.899Z [6AAACB90 verbose 'DvsTracker'] FetchSwitches: added 0 items
2013-01-17T13:08:42.899Z [6AAACB90 verbose 'DvsTracker'] FetchDVPortgroups: added 0 items
2013-01-17T13:08:42.899Z [6AAACB90 info 'DvsTracker'] FetchUplinkDVPortgroups: added 0 items
2013-01-17T13:09:03.845Z [6AB2EB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2013-01-17T13:09:04.711Z [6AE05B90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:09:05.387Z [6AD84B90 verbose 'ResourcePool ha-root-pool'] Root pool capacity changed from 14423MHz/13571MB to 14423MHz/13572MB
2013-01-17T13:09:34.713Z [6AAEDB90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:10:02.437Z [6AEC8B90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:10:02.540Z [6AAACB90 info 'Vmomi'] Activation [N5Vmomi10ActivationE:0xffccc458] : Invoke done [waitForUpdates] on [vmodl.query.PropertyCollector:ha-property-collector]
2013-01-17T13:10:02.540Z [6AAACB90 verbose 'Vmomi'] Arg version:
--> "38"
2013-01-17T13:10:02.540Z [6AAACB90 info 'Vmomi'] Throw vmodl.fault.RequestCanceled
2013-01-17T13:10:02.540Z [6AAACB90 info 'Vmomi'] Result:
--> (vmodl.fault.RequestCanceled) {
-->    dynamicType = <unset>,
-->    faultCause = (vmodl.MethodFault) null,
-->    msg = "",
--> }
2013-01-17T13:10:02.541Z [6AB2EB90 error 'SoapAdapter.HTTPService'] HTTP Transaction failed on stream TCP(error:Transport endpoint is not connected) with error N7Vmacore15SystemExceptionE(Connection reset by peer)
2013-01-17T13:10:04.714Z [6AAACB90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:10:05.388Z [6AEC8B90 verbose 'ResourcePool ha-root-pool'] Root pool capacity changed from 14423MHz/13572MB to 14423MHz/13571MB
2013-01-17T13:10:11.702Z [6A9C9B90 verbose 'Vmsvc'] RefreshVms updated overhead for 1 VM
2013-01-17T13:10:34.715Z [6AB2EB90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:10:35.290Z [FF962D20 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2013-01-17T13:11:04.717Z [FF962D20 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:11:05.390Z [6AE86B90 verbose 'ResourcePool ha-root-pool'] Root pool capacity changed from 14423MHz/13571MB to 14423MHz/13572MB
2013-01-17T13:11:34.718Z [6AB2EB90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:12:04.719Z [6AAEDB90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:12:06.734Z [6AD84B90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2013-01-17T13:12:09.398Z [6AD84B90 verbose 'DvsManager'] PersistAllDvsInfo called
2013-01-17T13:12:34.721Z [6AE05B90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:13:04.722Z [6AA2AB90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:13:11.706Z [6AAACB90 verbose 'Vmsvc'] RefreshVms updated overhead for 1 VM
2013-01-17T13:13:34.723Z [6AD84B90 verbose 'SoapAdapter'] Responded to service state request
2013-01-17T13:13:38.178Z [6AA6BB90 verbose 'Cimsvc'] Ticket issued for CIMOM version 1.0, user root
2013-01-17T13:13:42.936Z [6A9C9B90 error 'EnvironmentManager'] Unable to obtain the default HW version

Reply
0 Kudos
Sreec
VMware Employee
VMware Employee
Jump to solution

Hi ,

    Looks like a port level issue.Can you please make KB:http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&e... is followed properly.

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos
einstein-a-go-g
Hot Shot
Hot Shot
Jump to solution

I'll follow, just like to add

it's been rock solid for 6 months, recent changes

1. addition of VMware Data Protection appliance.(currently now turn off)

2. integration with Microsoft SCVMM 2012 SP1 Beta (currently now turn off)

and still doing it every 45 seconds.

Now it's possible somebody has applied a lockdown firewall Group Policy to the OU!

that's a thought checking....

Reply
0 Kudos
memaad
Virtuoso
Virtuoso
Jump to solution

Hi,

As from logs, I see no heartbeat messages , apply this in vCenter server

To increase the timeout limit:
  1. Open the C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg file (on Windows 2008, C:\ProgramData\VMware\VMware VirtualCenter\vpxd.cfg) using a text editor. For more information on the location of the vpxd.cfg file, seeCommon vCenter Server and vSphere Client Windows paths (1028185).

  2. Add this information in the <vpxd> tags:

    <heartbeat>
    <notRespondingTimeout>180</notRespondingTimeout>
    </heartbeat>


  3. Restart the VMware VirtualCenter Server service.

Lets monitor after making this changes. Also would like to know if this is only host that is affected.

Regards

Mohammed

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
einstein-a-go-g
Hot Shot
Hot Shot
Jump to solution

I'll try that this morning, after checking firewall issues.

Reply
0 Kudos
einstein-a-go-g
Hot Shot
Hot Shot
Jump to solution

Ok, stable at last, it appears, the vCenter Server was moved OUs, and then a Group Policy Update was applied, this must have contained firewall changes which caused the upset. Just need to find out which firewall rules broke it!

but thanks for the wake-up call, I cannot believe I've wasted so much time on this, because the server was moved OUs!

Thanks Guys!

Reply
0 Kudos
Sreec
VMware Employee
VMware Employee
Jump to solution

Hi,

    Great to hear that!!! Wish you a good weekend Smiley Happy

Cheers,
Sree | VCIX-5X| VCAP-5X| VExpert 6x|Cisco Certified Specialist
Please KUDO helpful posts and mark the thread as solved if answered
Reply
0 Kudos