On cluster one of the ESXi 5.5 host went in to not responding status on vCenter and we tried to restart the management server it's stuck up in middle and also reconnect the host to vCenter but failed.
Unable to connect the host through vSphere client
In vmkernal log it's showing the below error message
ALERT:hostd detected to be non-responsive
World: 14302: VC opID hostd-6e72 maps to vmkernel opID 6e44ae8a
)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient to support effective reservation 7368 KB
Hi there,
according to the message it seems someone might have played with the memory reservations for the ESXi services to "free up some memory" - particularly it seems snmpd daemon is affected. It can be found under the Thick client in Configuration -> Software -> System Resource Allocation -> Advanced -> and there try searching for "snmpd". If you do not use SNMP you can try to disable it with the use of this KB: vSphere Documentation Center
Also, it could be a bug in an ESXi version you are using - perhaps updating to the latest patch could help. Either that or your host is running out of memory pretty quick, but in that case it would start to swap out virtual memory to disk and not hanging.
Can you please post vmkernel.log and vmkwarning.log from /var/log that you can retrieve via WinSCP after connecting to the affected ESXi host?
Hi Alistar,
I am unable to connect the WINSCP to pull requsted logs
I have taken the logs through SSH
vmkwarning.log
T05:16:00.638Z cpu11:9146523)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.55664" state in doubt; requested fast path state update...
T05:20:01.652Z cpu22:33582)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
T05:20:38.177Z cpu24:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
T05:21:14.391Z cpu24:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
T05:21:35.250Z cpu20:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
T05:22:54.881Z cpu20:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
T05:23:14.257Z cpu18:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
T05:23:32.880Z cpu34:33764)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
T05:26:31.506Z cpu46:12036491)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient t o support effective reservation 7368 KB
T05:26:32.170Z cpu13:12036500)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient t o support effective reservation 7368 KB
T05:26:32.819Z cpu23:12036509)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient t o support effective reservation 7368 KB
T05:26:33.468Z cpu36:12036518)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient t o support effective reservation 7368 KB
VMKernal logs
cpu16:12036416)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
cpu16:12036416)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
cpu19:33582)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
cpu16:12036416)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
cpu35:33764)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
cpu18:33582)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded
cpu55:32900)ScsiDeviceIO: 2307: Cmd(0x4137011d5680) 0x2a, CmdSN 0x800e0045 from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0
cpu55:32900)ScsiDeviceIO: 2307: Cmd(0x41370020ca40) 0x2a, CmdSN 0x800e0079 from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0
cpu55:32900)ScsiDeviceIO: 2307: Cmd(0x413704673b80) 0x2a, CmdSN 0x800e0061 from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0
cpu55:32900)ScsiDeviceIO: 2307: Cmd(0x413702caaec0) 0x2a, CmdSN 0x800e0059 from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0
cpu55:32921)<7>fnic : 1 :: Abort Cmd called FCID 0xd70260, LUN 0x7 TAG 70 flags 3
cpu29:33163)<7>fnic : 1 :: abts cmpl recd. id 112 status FCPIO_SUCCESS
cpu55:32921)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
cpu55:32921)<7>fnic : 1 :: Abort Cmd called FCID 0xd70260, LUN 0x7 TAG 71 flags 3
cpu35:33759)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.600" state in doubt; requested fast path state update...
2015-08-06T17:52:59.463Z cpu35:33759)ScsiDeviceIO: 2307: Cmd(0x413701462ec0) 0x2a, CmdSN 0x800e003a from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0
cpu29:33163)<7>fnic : 1 :: abts cmpl recd. id 113 status FCPIO_SUCCESS
cpu55:32921)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
cpu55:32921)<7>fnic : 1 :: Abort Cmd called FCID 0xd70260, LUN 0x7 TAG 72 flags 3
cpu29:33163)<7>fnic : 1 :: abts cmpl recd. id 114 status FCPIO_SUCCESS
cpu55:32921)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS
Following logs "Maximum number of cartels per userspace exceeded", seem to be the same issue written in another post Help with hung up host.
At the moment this post still has no answer, but like said Alistar, try to patch hypervisor and see what happens....
Just few questions: is your hardware healthy and your workload not in contention? Please could you post simple infos like MEM/Processor in use/available? are there any non-default cluster configuration like EVC, etc?
Hi
Dell R710 Model drivers,Bios & firmware updated and no EVC mode enable on cluster, enough resource available
ESXi 5.5 Patch 5 re-release | 2015-05-08 | 2718055 |
Ok let's try to open a support request to vmware and/or Dell...
Did you install esxi version downloaded from DELL ? Vendors always recommend to download their customized version of esxi.
Regards
Dell customized version only i have installed to my esxi hosts
Today 5 more hosts went not responding status
NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x41368551df80, 0) to dev "mpx.vmhba33:C0:T0:L0" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
cpu22:32827)ScsiDeviceIO: 2338: Cmd(0x41368551df80) 0x1a, CmdSN 0x18078b from world 0 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
cpu22:32827)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x41368551df80, 0) to dev "mpx.vmhba33:C0:T0:L1" on path "vmhba33:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
cpu22:32827)ScsiDeviceIO: 2338: Cmd(0x41368551df80) 0x1a, CmdSN 0x18078c from world 0 to dev "mpx.vmhba33:C0:T0:L1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
cpu16:2305207)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x12 (0x41368551df80, 0) to dev "naa.6005" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
cpu4:45191)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e833bc040, 0) to dev "naa.6005" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
cpu4:32801)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e42962340, 0) to dev "mpx.vmhba33:C0:T0:L0" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
cpu4:32801)ScsiDeviceIO: 2338: Cmd(0x412e42962340) 0x1a, CmdSN 0x1eab440 from world 0 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
cpu4:32801)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e42962340, 0) to dev "mpx.vmhba33:C0:T0:L1" on path "vmhba33:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
cpu14:16489356)ScsiDeviceIO: 2338: Cmd(0x413640001b00) 0x4d, CmdSN 0x3256 from world 25161080 to dev "naa.60051d05" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
cpu14:16489356)ScsiDeviceIO: 2338: Cmd(0x413640001b00) 0x1a, CmdSN 0x3257 from world 25161080 to dev "naa.6005005" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
cpu12:25161184)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40
cpu12:25161184)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xffb01cac)
cpu6:32803)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e4b263ec0, 0) to dev "mpx.vmhba5:C0:T0:L0" on path "vmhba5:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
cpu6:32803)ScsiDeviceIO: 2338: Cmd(0x412e4b263ec0) 0x1a, CmdSN 0xe8601a from world 0 to dev "mpx.vmhba5:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
cpu11:32808)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x4136438aa100, 0) to dev "mpx.vmhba5:C0:T0:L0" on path "vmhba5:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
cpu9:33005)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x12 (0x412e838e3740, 0) to dev "naa.600171a6196fca06" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
cpu23:250801)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x413682fa6100, 0) to dev "naa.600171a6196fca06" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
cpu9:2305202)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e89606e40, 0) to dev "naa.600171a6196fca06" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
cpu5:32810)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e86059a80, 0) to dev "mpx.vmhba33:C0:T0:L0" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE
cpu5:32810)ScsiDeviceIO: 2338: Cmd(0x412e86059a80) 0x1a, CmdSN 0x173c73 from world 0 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
WARNING: HBX: 1884: HB [HB state abcdef02 offset 3952640 gen 7019 stampUS 6932991013173 uuid 5558819e-223c42e2-8639-0025b5270a3f jrnl <FB 1523> drv 14.60] was aborted on vol 'Scratch-', disk contents u$
cpu8:11251940)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a53737741" state in doubt; requested fast path state update...
cpu4:32951)WARNING: J3: 3297: Error committing txn callerID: 0xc1d0000f to slot 0: IO was aborted by VMFS via a virt-reset on the device
cpu26:11531505)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a7741" state in doubt; requested fast path state update...
cpu26:11531505)WARNING: HBX: 1884: HB [HB state abcdef02 offset 3952640 gen 7019 stampUS 6932997945332 uuid 5558819e-223c42e2-8639-0025b5270a3f jrnl <FB 1523> drv 14.60] was aborted on vol 'Scratch-', disk conten$
pu2:32951)WARNING: J3: 3297: Error committing txn callerID: 0xc1d0000f to slot 0: IO was aborted by VMFS via a virt-reset on the device
cpu25:12973300)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a7741" state in doubt; requested fast path state update...
2015-08-05T17:45:50.608Z cpu25:12973300)WARNING: HB
Do you have a solution? We are on the same build and are having the same issue using relatively the same Dell hardware (R710HD Blades).
VMWare Logs:
2015-10-02T16:00:25.040Z cpu12:33080)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000d310003e7f0000000000000000cd" state in doubt; requested fast path state update...
2015-10-02T16:00:25.040Z cpu12:33080)ScsiDeviceIO: 2325: Cmd(0x412e85a5a5c0) 0x2a, CmdSN 0xe8d98 from world 32797 to dev "naa.6000d310003e7f0000000000000000cd" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.
2015-10-02T16:00:25.040Z cpu12:33080)WARNING: HBX: 1884: HB [HB state abcdef02 offset 3522560 gen 747 stampUS 1515624202353 uuid 55f78c4b-67d31376-623b-848f69734229 jrnl <FB 411478> drv 14.60] was aborted on vol 'R-EXCHMB06 LUN 361', disk conten$
2015-10-02T16:00:25.040Z cpu12:33080)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x2a (0x412e88e5ad80, 32797) to dev "naa.6000d310003e7f0000000000000000e6" on path "vmhba37:C0:T5:L29" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0xe 0x1d 0x0. Act:EVAL
2015-10-02T16:00:25.040Z cpu12:33080)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000d310003e7f0000000000000000e6" state in doubt; requested fast path state update...
2015-10-02T16:00:25.040Z cpu12:33080)ScsiDeviceIO: 2338: Cmd(0x412e88e5ad80) 0x2a, CmdSN 0xd390b from world 32797 to dev "naa.6000d310003e7f0000000000000000e6" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0xe 0x1d 0x0.
2015-10-02T16:00:25.040Z cpu12:33080)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x2a (0x412e832ade80, 32797) to dev "naa.6000d310003e7f0000000000000000e8" on path "vmhba37:C0:T5:L45" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL
2015-10-02T16:00:25.040Z cpu12:33080)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000d310003e7f0000000000000000e8" state in doubt; requested fast path state update...
2015-10-02T16:00:25.040Z cpu12:33080)ScsiDeviceIO: 2338: Cmd(0x412e832ade80) 0x2a, CmdSN 0xd394f from world 32797 to dev "naa.6000d310003e7f0000000000000000e8" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2015-10-02T16:00:25.040Z cpu12:33080)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x2a (0x412e86fa0e00, 32797) to dev "naa.6000d310003e7f00000000000000006f" on path "vmhba37:C0:T5:L38" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0. Act:EVAL
2015-10-02T16:00:25.040Z cpu12:33080)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000d310003e7f00000000000000006f" state in doubt; requested fast path state update...
VMWare is telling us there's a latency issue between the host and the SAN, but did not provide anything more useful. Did you get anywhere?
Thanks
