VMware Cloud Community
JohnsVCP5
Enthusiast
Enthusiast

ESXi 5.5 went in to not responding status on vCenter

On cluster one of the ESXi 5.5 host went in to not responding status on vCenter and we tried to restart the management server it's stuck up in middle and also reconnect the host to vCenter but failed.

Unable to connect the host through vSphere client

In vmkernal log it's showing the below error message

ALERT:hostd detected to be non-responsive

World: 14302: VC opID hostd-6e72 maps to vmkernel opID 6e44ae8a

)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient to support effective reservation 7368 KB

Reply
0 Kudos
7 Replies
Alistar
Expert
Expert

Hi there,

according to the message it seems someone might have played with the memory reservations for the ESXi services to "free up some memory" - particularly it seems snmpd daemon is affected. It can be found under the Thick client in Configuration -> Software -> System Resource Allocation -> Advanced -> and there try searching for "snmpd". If you do not use SNMP you can try to disable it with the use of this KB: vSphere Documentation Center

Also, it could be a bug in an ESXi version you are using - perhaps updating to the latest patch could help. Either that or your host is running out of memory pretty quick, but in that case it would start to swap out virtual memory to disk and not hanging.

Can you please post vmkernel.log and vmkwarning.log from /var/log that you can retrieve via WinSCP after connecting to the affected ESXi host?

Stop by my blog if you'd like :slightly_smiling_face: I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
Reply
0 Kudos
JohnsVCP5
Enthusiast
Enthusiast

Hi Alistar,

I am unable to connect the WINSCP to pull requsted logs

I have taken the logs through SSH

vmkwarning.log

T05:16:00.638Z cpu11:9146523)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.55664" state in doubt; requested fast path state update...

T05:20:01.652Z cpu22:33582)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

T05:20:38.177Z cpu24:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

T05:21:14.391Z cpu24:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

T05:21:35.250Z cpu20:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

T05:22:54.881Z cpu20:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

T05:23:14.257Z cpu18:11879011)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

T05:23:32.880Z cpu34:33764)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

T05:26:31.506Z cpu46:12036491)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient t                             o support effective reservation 7368 KB

T05:26:32.170Z cpu13:12036500)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient t                             o support effective reservation 7368 KB

T05:26:32.819Z cpu23:12036509)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient t                             o support effective reservation 7368 KB

T05:26:33.468Z cpu36:12036518)WARNING: MemSched: 15647: Group snmpd: Requested memory limit 0 KB insufficient t                             o support effective reservation 7368 KB

VMKernal logs

cpu16:12036416)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

cpu16:12036416)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

cpu19:33582)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

cpu16:12036416)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

cpu35:33764)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

cpu18:33582)WARNING: UserMemTouched: 779: Maximum number of cartels per userspace exceeded

cpu55:32900)ScsiDeviceIO: 2307: Cmd(0x4137011d5680) 0x2a, CmdSN 0x800e0045 from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0

cpu55:32900)ScsiDeviceIO: 2307: Cmd(0x41370020ca40) 0x2a, CmdSN 0x800e0079 from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0

cpu55:32900)ScsiDeviceIO: 2307: Cmd(0x413704673b80) 0x2a, CmdSN 0x800e0061 from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0

cpu55:32900)ScsiDeviceIO: 2307: Cmd(0x413702caaec0) 0x2a, CmdSN 0x800e0059 from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0

cpu55:32921)<7>fnic : 1 :: Abort Cmd called FCID 0xd70260, LUN 0x7 TAG 70 flags 3

cpu29:33163)<7>fnic : 1 :: abts cmpl recd. id 112 status FCPIO_SUCCESS

cpu55:32921)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS

cpu55:32921)<7>fnic : 1 :: Abort Cmd called FCID 0xd70260, LUN 0x7 TAG 71 flags 3

cpu35:33759)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.600" state in doubt; requested fast path state update...

2015-08-06T17:52:59.463Z cpu35:33759)ScsiDeviceIO: 2307: Cmd(0x413701462ec0) 0x2a, CmdSN 0x800e003a from world 50250 to dev "naa.6050" failed H:0x8 D:0x0 P:0x0

cpu29:33163)<7>fnic : 1 :: abts cmpl recd. id 113 status FCPIO_SUCCESS

cpu55:32921)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS

cpu55:32921)<7>fnic : 1 :: Abort Cmd called FCID 0xd70260, LUN 0x7 TAG 72 flags 3

cpu29:33163)<7>fnic : 1 :: abts cmpl recd. id 114 status FCPIO_SUCCESS

cpu55:32921)<7>fnic : 1 :: Returning from abort cmd type 2 SUCCESS

Reply
0 Kudos
linotelera
Hot Shot
Hot Shot

Following logs "Maximum number of cartels per userspace exceeded", seem to be the same issue written in another post Help with hung up host.

At the moment this post still has no answer, but like said Alistar‌, try to patch hypervisor and see what happens....

Just few questions: is your hardware healthy and your workload not in contention? Please could you post simple infos like MEM/Processor in use/available? are there any non-default cluster configuration like EVC, etc?

Reply
0 Kudos
JohnsVCP5
Enthusiast
Enthusiast

Hi

Dell R710 Model drivers,Bios & firmware updated and no EVC mode enable on cluster, enough resource available

ESXi 5.5 Patch 5 re-release

2015-05-082718055
Reply
0 Kudos
linotelera
Hot Shot
Hot Shot

Ok let's try to open a support request to vmware and/or Dell...

Did you install esxi version downloaded from DELL ? Vendors always recommend ​to download their customized version of esxi.

Regards

Reply
0 Kudos
JohnsVCP5
Enthusiast
Enthusiast

Dell customized version only i have installed to my esxi hosts

Today 5 more hosts went not responding status

NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x41368551df80, 0) to dev "mpx.vmhba33:C0:T0:L0" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

cpu22:32827)ScsiDeviceIO: 2338: Cmd(0x41368551df80) 0x1a, CmdSN 0x18078b from world 0 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

cpu22:32827)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x41368551df80, 0) to dev "mpx.vmhba33:C0:T0:L1" on path "vmhba33:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

cpu22:32827)ScsiDeviceIO: 2338: Cmd(0x41368551df80) 0x1a, CmdSN 0x18078c from world 0 to dev "mpx.vmhba33:C0:T0:L1" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

cpu16:2305207)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x12 (0x41368551df80, 0) to dev "naa.6005" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE

cpu4:45191)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e833bc040, 0) to dev "naa.6005" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE

cpu4:32801)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e42962340, 0) to dev "mpx.vmhba33:C0:T0:L0" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

cpu4:32801)ScsiDeviceIO: 2338: Cmd(0x412e42962340) 0x1a, CmdSN 0x1eab440 from world 0 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

cpu4:32801)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e42962340, 0) to dev "mpx.vmhba33:C0:T0:L1" on path "vmhba33:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

cpu14:16489356)ScsiDeviceIO: 2338: Cmd(0x413640001b00) 0x4d, CmdSN 0x3256 from world 25161080 to dev "naa.60051d05" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.

cpu14:16489356)ScsiDeviceIO: 2338: Cmd(0x413640001b00) 0x1a, CmdSN 0x3257 from world 25161080 to dev "naa.6005005" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.

cpu12:25161184)WARNING: UserEpoll: 542: UNSUPPORTED events 0x40

cpu12:25161184)WARNING: LinuxSocket: 1854: UNKNOWN/UNSUPPORTED socketcall op (whichCall=0x12, args@0xffb01cac)

cpu6:32803)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e4b263ec0, 0) to dev "mpx.vmhba5:C0:T0:L0" on path "vmhba5:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

cpu6:32803)ScsiDeviceIO: 2338: Cmd(0x412e4b263ec0) 0x1a, CmdSN 0xe8601a from world 0 to dev "mpx.vmhba5:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

cpu11:32808)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x4136438aa100, 0) to dev "mpx.vmhba5:C0:T0:L0" on path "vmhba5:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

cpu9:33005)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x12 (0x412e838e3740, 0) to dev "naa.600171a6196fca06" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE

cpu23:250801)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x413682fa6100, 0) to dev "naa.600171a6196fca06" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE

cpu9:2305202)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e89606e40, 0) to dev "naa.600171a6196fca06" on path "vmhba0:C1:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE

cpu5:32810)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x1a (0x412e86059a80, 0) to dev "mpx.vmhba33:C0:T0:L0" on path "vmhba33:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE

cpu5:32810)ScsiDeviceIO: 2338: Cmd(0x412e86059a80) 0x1a, CmdSN 0x173c73 from world 0 to dev "mpx.vmhba33:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

WARNING: HBX: 1884: HB [HB state abcdef02 offset 3952640 gen 7019 stampUS 6932991013173 uuid 5558819e-223c42e2-8639-0025b5270a3f jrnl <FB 1523> drv 14.60] was aborted on vol 'Scratch-', disk contents u$

cpu8:11251940)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a53737741" state in doubt; requested fast path state update...

cpu4:32951)WARNING: J3: 3297: Error committing txn callerID: 0xc1d0000f to slot 0: IO was aborted by VMFS via a virt-reset on the device

cpu26:11531505)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a7741" state in doubt; requested fast path state update...

cpu26:11531505)WARNING: HBX: 1884: HB [HB state abcdef02 offset 3952640 gen 7019 stampUS 6932997945332 uuid 5558819e-223c42e2-8639-0025b5270a3f jrnl <FB 1523> drv 14.60] was aborted on vol 'Scratch-', disk conten$

pu2:32951)WARNING: J3: 3297: Error committing txn callerID: 0xc1d0000f to slot 0: IO was aborted by VMFS via a virt-reset on the device

cpu25:12973300)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60a7741" state in doubt; requested fast path state update...

2015-08-05T17:45:50.608Z cpu25:12973300)WARNING: HB

Reply
0 Kudos
mmermelstein
Contributor
Contributor

Do you have a solution?  We are on the same build and are having the same issue using relatively the same Dell hardware (R710HD Blades). 

VMWare Logs:

2015-10-02T16:00:25.040Z cpu12:33080)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000d310003e7f0000000000000000cd" state in doubt; requested fast path state update...

2015-10-02T16:00:25.040Z cpu12:33080)ScsiDeviceIO: 2325: Cmd(0x412e85a5a5c0) 0x2a, CmdSN 0xe8d98 from world 32797 to dev "naa.6000d310003e7f0000000000000000cd" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0.

2015-10-02T16:00:25.040Z cpu12:33080)WARNING: HBX: 1884: HB [HB state abcdef02 offset 3522560 gen 747 stampUS 1515624202353 uuid 55f78c4b-67d31376-623b-848f69734229 jrnl <FB 411478> drv 14.60] was aborted on vol 'R-EXCHMB06 LUN 361', disk conten$

2015-10-02T16:00:25.040Z cpu12:33080)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x2a (0x412e88e5ad80, 32797) to dev "naa.6000d310003e7f0000000000000000e6" on path "vmhba37:C0:T5:L29" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0xe 0x1d 0x0. Act:EVAL

2015-10-02T16:00:25.040Z cpu12:33080)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000d310003e7f0000000000000000e6" state in doubt; requested fast path state update...

2015-10-02T16:00:25.040Z cpu12:33080)ScsiDeviceIO: 2338: Cmd(0x412e88e5ad80) 0x2a, CmdSN 0xd390b from world 32797 to dev "naa.6000d310003e7f0000000000000000e6" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0xe 0x1d 0x0.

2015-10-02T16:00:25.040Z cpu12:33080)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x2a (0x412e832ade80, 32797) to dev "naa.6000d310003e7f0000000000000000e8" on path "vmhba37:C0:T5:L45" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2015-10-02T16:00:25.040Z cpu12:33080)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000d310003e7f0000000000000000e8" state in doubt; requested fast path state update...

2015-10-02T16:00:25.040Z cpu12:33080)ScsiDeviceIO: 2338: Cmd(0x412e832ade80) 0x2a, CmdSN 0xd394f from world 32797 to dev "naa.6000d310003e7f0000000000000000e8" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2015-10-02T16:00:25.040Z cpu12:33080)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x2a (0x412e86fa0e00, 32797) to dev "naa.6000d310003e7f00000000000000006f" on path "vmhba37:C0:T5:L38" Failed: H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0. Act:EVAL

  2015-10-02T16:00:25.040Z cpu12:33080)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000d310003e7f00000000000000006f" state in doubt; requested fast path state update...


VMWare is telling us there's a latency issue between the host and the SAN, but did not provide anything more useful.  Did you get anywhere?


Thanks

Reply
0 Kudos