VMware Cloud Community
dssystems
Contributor
Contributor

ESXI 6.5 management services hangs after a couple of days (max. 35-40 days)

Hi,

First of all, we will explain the cluster we have. We have two Dell R520 with updated 6.5 ESXi Dell images of the hypervisor. We have used a USB key to install the OS of the hypervisor (instead of the Dual SD Card Module - we don't have it installed) in each of the servers.

After a couple of days, the same hypervisor is hanging partially. We detect the problem because when we are going to restart a VM, make a VM shutdown or start, or make some operations (as VM Snapshot creation), the task does not complete. - Vsphere HTML5 interfaces shown the task (for example, register VM, at 0%, 10% or a very slow process that completely hangs at 99%). We can't cancel the task, it appears random error message like "operation can not be completed" or directly we can't make click in the cancel icon.

We login in the hypervisor web client directly, and when we try to get the VM inventory (only in that tab), we get a very slow response (normally clients disconnects with "connection lost") We have in that hypervisor Veeam B&R in 9.5 U4b version to make the backups of the VMs. For example, today after 35 days we have received alerts that says us backup process can not be completed. The VM that failed was powered on (the guest os worked perfectly) but the vm appears as invalid in the VM list. We have shutdown it, we have remove it from inventory, but when we tried to register again, it has been impossible. More than 60 minutes and the task hangs at 99%.

We have connected by SSH to investigate the problem, and we have shown this in logs.

vmkwarning.log

2020-03-14T11:48:17.907Z cpu31:65616)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:48:40.201Z cpu25:65610)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:49:02.504Z cpu22:65607)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:49:24.804Z cpu22:65607)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:49:47.104Z cpu17:65602)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:50:09.401Z cpu17:65602)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:50:31.697Z cpu31:65616)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:50:53.864Z cpu23:65608)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:51:16.166Z cpu17:65602)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:51:38.464Z cpu19:65604)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:52:00.758Z cpu19:65604)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:52:23.061Z cpu27:65612)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:52:45.361Z cpu24:65609)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T11:53:07.660Z cpu24:65609)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

If we try to get information about the device...

esxcli storage core device list -d mpx.vmhba32:C0:T0:L0

... none. Command hangs and no response from SSH cli.

In the other hypervisor (the one that works like a charm), the response is...

mpx.vmhba32:C0:T0:L0

   Display Name: Local USB Direct-Access (mpx.vmhba32:C0:T0:L0)

   Has Settable Display Name: false

   Size: 7400

   Device Type: Direct-Access

   Multipath Plugin: NMP

   Devfs Path: /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0

   Vendor: TOSHIBA

   Model: TransMemory    

   Revision: 0100

   SCSI Level: 2

   Is Pseudo: false

   Status: on

   Is RDM Capable: false

   Is Local: true

   Is Removable: true

   Is SSD: false

   Is VVOL PE: false

   Is Offline: false

   Is Perennially Reserved: false

   Queue Full Sample Size: 0

   Queue Full Threshold: 0

   Thin Provisioning Status: unknown

   Attached Filters:

   VAAI Status: unsupported

   Other UIDs: vml.0000000000766d68626133323a303a30

   Is Shared Clusterwide: false

   Is Local SAS Device: false

   Is SAS: false

   Is USB: true

   Is Boot USB Device: true

   Is Boot Device: true

   Device Max Queue Depth: 1

   No of outstanding IOs with competing worlds: 1

   Drive Type: unknown

   RAID Level: unknown

   Number of Physical Drives: unknown

   Protection Enabled: false

   PI Activated: false

   PI Type: 0

   PI Protection Mask: NO PROTECTION

   Supported Guard Types: NO GUARD SUPPORT

   DIX Enabled: false

   DIX Guard Type: NO GUARD SUPPORT

   Emulated DIX/DIF Enabled: false

That id is from the pendrive that has the hypervisor OS. We suppose that ID is the same in both hypervisors...

If we look at the faulty hypervisor again... file VMkernel.log

in the beginning of the file...

2020-03-14T05:41:41.184Z cpu4:611239)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

2020-03-14T05:41:46.356Z cpu2:65587)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T05:41:46.356Z cpu2:65587)ScsiDeviceIO: 3015: Cmd(0x439510b358c0) 0x28, CmdSN 0x1 from world 67214 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x

2020-03-14T05:42:03.185Z cpu10:65885)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1002ms in the past

2020-03-14T05:42:03.185Z cpu10:65885)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

2020-03-14T05:42:08.627Z cpu2:65587)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T05:42:08.627Z cpu2:633249)VFAT: 4593: Failed to get object 36 type 2 uuid 5e3d7952-b8e7916b-ebd7-90b11c4405a0 cnum 0 dindex fffffffecdate 0 ctime 0 MS 0 :Storage initiator error

2020-03-14T05:42:15.186Z cpu7:609657)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1002ms in the past

2020-03-14T05:42:19.764Z cpu2:65587)ScsiDeviceIO: 3015: Cmd(0x4395013b4140) 0x9e, CmdSN 0x169c5d from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0

2020-03-14T05:42:20.184Z cpu9:66121)NMP: nmp_ResetDeviceLogThrottling:3458: last error status from device mpx.vmhba32:C0:T0:L0 repeated 5 times

2020-03-14T05:42:26.188Z cpu7:609657)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1002ms in the past

2020-03-14T05:42:26.188Z cpu7:609657)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

2020-03-14T05:42:30.901Z cpu2:65587)NMP: nmp_ThrottleLogForDevice:3630: Cmd 0x9e (0x4395013b4140, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Invalid se

2020-03-14T05:42:30.901Z cpu2:65587)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T05:42:48.185Z cpu14:622426)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1001ms in the past

2020-03-14T05:42:48.185Z cpu14:622426)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

2020-03-14T05:42:53.173Z cpu2:65587)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T05:42:53.173Z cpu2:65587)ScsiDeviceIO: 3015: Cmd(0x4395013b4140) 0x25, CmdSN 0x169cbe from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0

2020-03-14T05:43:10.184Z cpu2:611807)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1000ms in the past

2020-03-14T05:43:10.184Z cpu2:611807)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

2020-03-14T05:43:15.458Z cpu3:65588)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T05:43:20.184Z cpu12:66121)NMP: nmp_ResetDeviceLogThrottling:3458: last error status from device mpx.vmhba32:C0:T0:L0 repeated 4 times

2020-03-14T05:43:26.608Z cpu3:65588)NMP: nmp_ThrottleLogForDevice:3630: Cmd 0x1a (0x4395013b4140, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Invalid se

2020-03-14T05:43:26.608Z cpu3:65588)ScsiDeviceIO: 3015: Cmd(0x4395013b4140) 0x1a, CmdSN 0x169cbf from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x22 0x20

2020-03-14T05:43:32.184Z cpu1:609329)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1000ms in the past

2020-03-14T05:43:32.184Z cpu1:609329)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

2020-03-14T05:43:37.760Z cpu3:65588)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T05:43:44.186Z cpu9:622457)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1000ms in the past

2020-03-14T05:43:55.186Z cpu9:622457)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1000ms in the past

2020-03-14T05:43:55.186Z cpu9:622457)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

2020-03-14T05:44:00.056Z cpu3:65588)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T05:44:00.056Z cpu3:65588)ScsiDeviceIO: 3015: Cmd(0x4395013b4140) 0x28, CmdSN 0x1 from world 67214 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x

2020-03-14T05:44:17.184Z cpu9:611153)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1000ms in the past

2020-03-14T05:44:17.184Z cpu9:611153)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

2020-03-14T05:44:20.184Z cpu3:66121)NMP: nmp_ResetDeviceLogThrottling:3458: last error status from device mpx.vmhba32:C0:T0:L0 repeated 4 times

2020-03-14T05:44:22.355Z cpu3:65588)NMP: nmp_ThrottleLogForDevice:3630: Cmd 0x28 (0x4395013b4140, 67214) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Invali

2020-03-14T05:44:22.355Z cpu3:65588)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2020-03-14T05:44:22.355Z cpu3:82429)VFAT: 4593: Failed to get object 36 type 2 uuid 5e3d7952-b8e7916b-ebd7-90b11c4405a0 cnum 0 dindex fffffffecdate 0 ctime 0 MS 0 :Storage initiator error

At now...

2020-03-14T12:06:52.889Z cpu22:65607)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...                   

2020-03-14T12:06:52.889Z cpu22:65607)ScsiDeviceIO: 3015: Cmd(0x439510aa4bc0) 0x9e, CmdSN 0x16c0dc from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:07:04.036Z cpu22:65607)ScsiDeviceIO: 3015: Cmd(0x439d0591c040) 0x1a, CmdSN 0x16c0dd from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:07:05.788Z cpu14:65599)ScsiDeviceIO: 3015: Cmd(0x439510aa4bc0) 0x9e, CmdSN 0x16c0dc from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:07:11.293Z cpu24:622457)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry               

2020-03-14T12:07:15.080Z cpu14:65599)ScsiDeviceIO: 3015: Cmd(0x439d0591c040) 0x1a, CmdSN 0x16c0dd from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:07:15.184Z cpu22:65607)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...                   

2020-03-14T12:07:15.184Z cpu22:65607)ScsiDeviceIO: 3015: Cmd(0x43950dd8c380) 0x1a, CmdSN 0x16c0de from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:07:21.787Z cpu14:65599)ScsiDeviceIO: 3015: Cmd(0x43950dd8c380) 0x1a, CmdSN 0x16c0de from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:07:26.334Z cpu22:65607)ScsiDeviceIO: 3015: Cmd(0x439510aa4bc0) 0x25, CmdSN 0x16c0df from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:07:32.293Z cpu16:611807)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry               

2020-03-14T12:07:37.483Z cpu16:65601)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...                   

2020-03-14T12:07:37.483Z cpu16:65601)ScsiDeviceIO: 3015: Cmd(0x43950dd8c380) 0x28, CmdSN 0x1 from world 661564 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:07:50.293Z cpu29:66121)NMP: nmp_ResetDeviceLogThrottling:3458: last error status from device mpx.vmhba32:C0:T0:L0 repeated 5 times                                                 

2020-03-14T12:07:53.293Z cpu20:65885)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry                

2020-03-14T12:07:55.081Z cpu4:65589)ScsiDeviceIO: 3015: Cmd(0x439d0591c040) 0x28, CmdSN 0x1 from world 67214 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x

2020-03-14T12:07:55.081Z cpu29:67976)VFAT: 4593: Failed to get object 36 type 2 uuid 5e3d7952-b8e7916b-ebd7-90b11c4405a0 cnum 0 dindex fffffffecdate 0 ctime 0 MS 0 :Timeout                     

2020-03-14T12:07:57.936Z cpu16:65601)NMP: nmp_ThrottleLogForDevice:3630: Cmd 0x28 (0x43950dd8c380, 661564) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Inva

2020-03-14T12:07:57.936Z cpu16:65601)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...                   

2020-03-14T12:08:01.789Z cpu4:65589)ScsiDeviceIO: 3015: Cmd(0x43950dd8c380) 0x28, CmdSN 0x1 from world 661564 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0

2020-03-14T12:08:01.789Z cpu7:661564)VFAT: 4593: Failed to get object 36 type 2 uuid ded6d1e0-dadb0f65-35ce-21b30441be7d cnum 0 dindex fffffffecdate 0 ctime 0 MS 0 :Timeout                     

2020-03-14T12:08:09.082Z cpu26:65611)ScsiDeviceIO: 3015: Cmd(0x439510aa4bc0) 0x1a, CmdSN 0x16c0ea from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:08:15.294Z cpu21:611239)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1001ms in the past                                 

2020-03-14T12:08:15.294Z cpu21:611239)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry               

2020-03-14T12:08:20.234Z cpu26:65611)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...                   

2020-03-14T12:08:20.234Z cpu26:65611)ScsiDeviceIO: 3015: Cmd(0x439d00afa480) 0x9e, CmdSN 0x16c0eb from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:08:37.341Z cpu5:611153)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1047ms in the past                                  

2020-03-14T12:08:37.341Z cpu5:611153)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry                

2020-03-14T12:08:42.874Z cpu26:65611)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...                   

2020-03-14T12:08:42.874Z cpu26:65611)ScsiDeviceIO: 3015: Cmd(0x439510aa4bc0) 0x28, CmdSN 0x1 from world 661533 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0

2020-03-14T12:08:49.299Z cpu1:609657)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1002ms in the past                                  

2020-03-14T12:08:50.294Z cpu24:66121)NMP: nmp_ResetDeviceLogThrottling:3458: last error status from device mpx.vmhba32:C0:T0:L0 repeated 5 times                                                 

2020-03-14T12:08:54.028Z cpu21:65606)NMP: nmp_ThrottleLogForDevice:3630: Cmd 0x25 (0x439d00afa480, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Invalid s

2020-03-14T12:08:54.028Z cpu21:65606)ScsiDeviceIO: 3015: Cmd(0x439d00afa480) 0x25, CmdSN 0x16c0ec from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Invalid sense data: 0x30 0x2

2020-03-14T12:09:00.302Z cpu1:609657)ScsiPath: 5149: Command 0x0 (cmdSN 0x0, World 0) to path vmhba32:C0:T0:L0 timed out: expiry time occurs 1003ms in the past                                  

2020-03-14T12:09:00.302Z cpu1:609657)VMW_SATP_LOCAL: satp_local_updatePath:836: Failed to update path "vmhba32:C0:T0:L0" state. Status=Transient storage condition, suggest retry

In syslog.log

2020-03-14T12:11:25Z sfcb-vmware_base[67999]: updateHealthStateFromSel(637) Invalid "RecordData" in Instance of root/cimv2:OMC_IpmiLogRecord.CreationClassName="OMC_IpmiLogRecord",LogCreationClassName="OMC_IpmiRecordLog",LogName="IPMI SEL",MessageTimestamp="20200207185931.000000+000",RecordID="25" {  PATH: root/cimv2:OMC_IpmiLogRecord.CreationClassName="OMC_IpmiLogRecord",LogCreationClassName="OMC_IpmiRecordLog",LogName="IPMI SEL",MessageTimestamp="20200207185931.000000+000",RecordID="25"  RecordID = "25" ;  RecordFormat = "*string CIM_Sensor.DeviceID*uint8[2] IPMI_RecordID*uint8 IPMI_RecordType*uint8[4] IPMI_Timestamp*uint8[2] IPMI_GeneratorID*uint8 IPMI_EvMRev*uint8 IPMI_SensorType*uint8 IPMI_SensorNumber*boolean IPMI_AssertionEvent*uint8 IPMI_EventType*uint8 IPMI_EventData1*uint8 IPMI_EventData2*uint8 IPMI_EventData3*uint32 IANA*" ;  RecordData = "*100*100.0.32*25 0

2020-03-14T12:11:25Z sfcb-vmware_base[67999]: *2*147 179 61 94*32 0*4*27*100*100*false*111*1*255*255*1*" ;  MessageTimestamp = "20200207185931.000000+000" ;  LogName = "IPMI SEL" ;  LogCreationClassName = "OMC_IpmiRecordLog" ;  Description = "Assert + Cable/Interconnect Config Error" ;  CreationClassName = "OMC_IpmiLogRecord" ; }

2020-03-14T12:12:06Z snmpd: load_lags: no port group configs found

2020-03-14T12:12:06Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:12:06Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:12:06Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:13:21Z snmpd: load_lags: no port group configs found

2020-03-14T12:13:21Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:13:21Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:13:21Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:13:25Z sfcb-vmware_base[67999]: updateHealthStateFromSel(637) Invalid "RecordData" in Instance of root/cimv2:OMC_IpmiLogRecord.CreationClassName="OMC_IpmiLogRecord",LogCreationClassName="OMC_IpmiRecordLog",LogName="IPMI SEL",MessageTimestamp="20200207185931.000000+000",RecordID="25" {  PATH: root/cimv2:OMC_IpmiLogRecord.CreationClassName="OMC_IpmiLogRecord",LogCreationClassName="OMC_IpmiRecordLog",LogName="IPMI SEL",MessageTimestamp="20200207185931.000000+000",RecordID="25"  RecordID = "25" ;  RecordFormat = "*string CIM_Sensor.DeviceID*uint8[2] IPMI_RecordID*uint8 IPMI_RecordType*uint8[4] IPMI_Timestamp*uint8[2] IPMI_GeneratorID*uint8 IPMI_EvMRev*uint8 IPMI_SensorType*uint8 IPMI_SensorNumber*boolean IPMI_AssertionEvent*uint8 IPMI_EventType*uint8 IPMI_EventData1*uint8 IPMI_EventData2*uint8 IPMI_EventData3*uint32 IANA*" ;  RecordData = "*100*100.0.32*25 0

2020-03-14T12:13:25Z sfcb-vmware_base[67999]: *2*147 179 61 94*32 0*4*27*100*100*false*111*1*255*255*1*" ;  MessageTimestamp = "20200207185931.000000+000" ;  LogName = "IPMI SEL" ;  LogCreationClassName = "OMC_IpmiRecordLog" ;  Description = "Assert + Cable/Interconnect Config Error" ;  CreationClassName = "OMC_IpmiLogRecord" ; }

2020-03-14T12:14:36Z snmpd: load_lags: no port group configs found

2020-03-14T12:14:36Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:14:36Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:14:36Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:15:00Z crond[66574]: USER root pid 661641 cmd /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh

2020-03-14T12:15:01Z syslog[661644]: starting hostd probing.

2020-03-14T12:15:26Z sfcb-vmware_base[67999]: updateHealthStateFromSel(637) Invalid "RecordData" in Instance of root/cimv2:OMC_IpmiLogRecord.CreationClassName="OMC_IpmiLogRecord",LogCreationClassName="OMC_IpmiRecordLog",LogName="IPMI SEL",MessageTimestamp="20200207185931.000000+000",RecordID="25" {  PATH: root/cimv2:OMC_IpmiLogRecord.CreationClassName="OMC_IpmiLogRecord",LogCreationClassName="OMC_IpmiRecordLog",LogName="IPMI SEL",MessageTimestamp="20200207185931.000000+000",RecordID="25"  RecordID = "25" ;  RecordFormat = "*string CIM_Sensor.DeviceID*uint8[2] IPMI_RecordID*uint8 IPMI_RecordType*uint8[4] IPMI_Timestamp*uint8[2] IPMI_GeneratorID*uint8 IPMI_EvMRev*uint8 IPMI_SensorType*uint8 IPMI_SensorNumber*boolean IPMI_AssertionEvent*uint8 IPMI_EventType*uint8 IPMI_EventData1*uint8 IPMI_EventData2*uint8 IPMI_EventData3*uint32 IANA*" ;  RecordData = "*100*100.0.32*25 0

2020-03-14T12:15:26Z sfcb-vmware_base[67999]: *2*147 179 61 94*32 0*4*27*100*100*false*111*1*255*255*1*" ;  MessageTimestamp = "20200207185931.000000+000" ;  LogName = "IPMI SEL" ;  LogCreationClassName = "OMC_IpmiRecordLog" ;  Description = "Assert + Cable/Interconnect Config Error" ;  CreationClassName = "OMC_IpmiLogRecord" ; }

2020-03-14T12:15:51Z snmpd: load_lags: no port group configs found

2020-03-14T12:15:51Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:15:51Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

2020-03-14T12:15:51Z snmpd: lookup_vswitch: fetch VSI_MODULE_NODE_PortCfgs_properties failed Not found

Nothing else relevant in other log files...

vmksummary.log

2020-03-13T07:00:00Z heartbeat: up 34d11h34m35s, 10 VMs; [[242166 vmx 8015872kB] [129039 vmx 8384748kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T08:00:01Z heartbeat: up 34d12h34m35s, 10 VMs; [[242166 vmx 8017920kB] [129039 vmx 8384944kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T09:00:01Z heartbeat: up 34d13h34m35s, 10 VMs; [[242166 vmx 8017920kB] [129039 vmx 8385800kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T10:00:01Z heartbeat: up 34d14h34m35s, 10 VMs; [[242166 vmx 8007680kB] [129039 vmx 8385032kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T11:00:01Z heartbeat: up 34d15h34m36s, 10 VMs; [[242166 vmx 8017920kB] [129039 vmx 8384744kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T12:00:01Z heartbeat: up 34d16h34m36s, 10 VMs; [[242166 vmx 8028160kB] [129039 vmx 8385772kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T13:00:00Z heartbeat: up 34d17h34m35s, 10 VMs; [[242166 vmx 8028156kB] [129039 vmx 8384668kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T14:00:01Z heartbeat: up 34d18h34m35s, 10 VMs; [[242166 vmx 8028136kB] [129039 vmx 8384736kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T15:00:00Z heartbeat: up 34d19h34m35s, 10 VMs; [[242166 vmx 8028152kB] [129039 vmx 8385832kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T16:00:01Z heartbeat: up 34d20h34m35s, 10 VMs; [[242166 vmx 8028152kB] [129039 vmx 8385464kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T17:00:01Z heartbeat: up 34d21h34m35s, 10 VMs; [[242166 vmx 8030208kB] [129039 vmx 8385024kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T18:00:01Z heartbeat: up 34d22h34m36s, 10 VMs; [[242166 vmx 8036328kB] [129039 vmx 8386020kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T19:00:00Z heartbeat: up 34d23h34m35s, 10 VMs; [[242166 vmx 8036352kB] [129039 vmx 8385780kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-13T20:00:01Z heartbeat: up 35d0h34m35s, 10 VMs; [[242166 vmx 8036352kB] [129039 vmx 8385976kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-13T21:00:00Z heartbeat: up 35d1h34m35s, 10 VMs; [[242166 vmx 8036352kB] [129039 vmx 8386060kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-13T22:00:01Z heartbeat: up 35d2h34m35s, 10 VMs; [[242166 vmx 8036332kB] [129039 vmx 8384328kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-13T23:00:01Z heartbeat: up 35d3h34m35s, 10 VMs; [[242166 vmx 8046564kB] [129039 vmx 8385820kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-14T00:00:01Z heartbeat: up 35d4h34m35s, 10 VMs; [[242166 vmx 8044528kB] [129039 vmx 8385240kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-14T01:00:00Z heartbeat: up 35d5h34m35s, 10 VMs; [[242166 vmx 8044528kB] [129039 vmx 8385908kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-14T02:00:00Z heartbeat: up 35d6h34m35s, 10 VMs; [[242166 vmx 8044512kB] [129039 vmx 8385984kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-14T03:00:01Z heartbeat: up 35d7h34m35s, 10 VMs; [[242166 vmx 8046576kB] [129039 vmx 8376440kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-14T04:00:01Z heartbeat: up 35d8h34m35s, 10 VMs; [[242166 vmx 8050688kB] [129039 vmx 8375332kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-14T05:00:01Z heartbeat: up 35d9h34m35s, 10 VMs; [[242166 vmx 8050688kB] [129039 vmx 8367740kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-14T06:00:01Z heartbeat: up 35d10h34m35s, 10 VMs; [[242166 vmx 8062968kB] [129039 vmx 8376736kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-14T07:00:01Z heartbeat: up 35d11h34m36s, 10 VMs; [[242166 vmx 7946236kB] [129039 vmx 8375944kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-14T08:00:00Z heartbeat: up 35d12h34m35s, 10 VMs; [[242166 vmx 7946240kB] [129039 vmx 8375820kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-14T09:00:01Z heartbeat: up 35d13h34m35s, 10 VMs; [[242166 vmx 8062968kB] [129039 vmx 8376704kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-14T10:00:00Z heartbeat: up 35d14h34m35s, 10 VMs; [[242166 vmx 8062968kB] [129039 vmx 8375808kB] [121198 vmx 8388608kB]] [[242166 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]       

2020-03-14T11:00:01Z heartbeat: up 35d15h34m35s, 8 VMs; [[517902 vmx 6832408kB] [129039 vmx 8374600kB] [121198 vmx 8388608kB]] [[517902 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]        

2020-03-14T12:00:01Z heartbeat: up 35d16h34m35s, 7 VMs; [[517902 vmx 6826876kB] [129039 vmx 8377732kB] [121198 vmx 8388608kB]] [[517902 vmx 0%max] [129039 vmx 0%max] [121198 vmx 0%max]]

vmkeventd.log

Nothing new from esxi boot date.

2020-02-07T19:14:21Z vmkeventd[66045]: DICT  featMask.evc.cpuid.MWAIT = "Val:1"

2020-02-07T19:14:21Z vmkeventd[66045]: DICT     featMask.evc.cpuid.NX = "Val:1"

2020-02-07T19:14:21Z vmkeventd[66045]: DICT     featMask.evc.cpuid.SS = "Val:1"

2020-02-07T19:14:21Z vmkeventd[66045]: DICT   featMask.evc.cpuid.SSE3 = "Val:1"

2020-02-07T19:14:21Z vmkeventd[66045]: DICT  featMask.evc.cpuid.SSSE3 = "Val:1"

2020-02-07T19:14:21Z vmkeventd[66045]: DICT --- SITE DEFAULTS /usr/lib/vmware/config

2020-02-07T19:14:21Z vmkeventd[66045]: vmkmod: /vmfs is a not a regular file

2020-02-07T19:14:21Z vmkeventd[66045]: VMKMod_ComputeModPath failed for module vmfs: "Bad parameter" (bad0007)

2020-02-07T19:15:04Z mark: storage-path-claim-completed

2020-02-07T19:25:34Z vmkeventd[66047]: DictionaryLoad: Cannot open file "/usr/lib/vmware/config": No such file or directory.

2020-02-07T19:25:34Z vmkeventd[66047]: DICT --- GLOBAL SETTINGS /usr/lib/vmware/settings

2020-02-07T19:25:34Z vmkeventd[66047]: DICT --- NON PERSISTENT (null)

2020-02-07T19:25:34Z vmkeventd[66047]: DICT --- HOST DEFAULTS /etc/vmware/config

2020-02-07T19:25:34Z vmkeventd[66047]: DICT                    libdir = "/usr/lib/vmware"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT           authd.proxy.nfc = "vmware-hostd:ha-nfc"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT        authd.proxy.nfcssl = "vmware-hostd:ha-nfcssl"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT   authd.proxy.vpxa-nfcssl = "vmware-vpxa:vpxa-nfcssl"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT      authd.proxy.vpxa-nfc = "vmware-vpxa:vpxa-nfc"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT            authd.fullpath = "/sbin/authd"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT featureCompat.evc.completeMasks = "TRUE"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT  featMask.evc.cpuid.Intel = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT featMask.evc.cpuid.FAMILY = "Val:6"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT  featMask.evc.cpuid.MODEL = "Val:0xf"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT featMask.evc.cpuid.STEPPING = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT featMask.evc.cpuid.NUMLEVELS = "Val:0xa"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT featMask.evc.cpuid.NUM_EXT_LEVELS = "Val:0x80000008"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT featMask.evc.cpuid.CMPXCHG16B = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT     featMask.evc.cpuid.DS = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT featMask.evc.cpuid.LAHF64 = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT     featMask.evc.cpuid.LM = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT  featMask.evc.cpuid.MWAIT = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT     featMask.evc.cpuid.NX = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT     featMask.evc.cpuid.SS = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT   featMask.evc.cpuid.SSE3 = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT  featMask.evc.cpuid.SSSE3 = "Val:1"

2020-02-07T19:25:34Z vmkeventd[66047]: DICT --- SITE DEFAULTS /usr/lib/vmware/config

2020-02-07T19:25:34Z vmkeventd[66047]: vmkmod: /vmfs is a not a regular file

2020-02-07T19:25:34Z vmkeventd[66047]: VMKMod_ComputeModPath failed for module vmfs: "Bad parameter" (bad0007)

2020-02-07T19:26:17Z mark: storage-path-claim-completed

In january we installed the hypervisor OS in the pendrive again, previously making a config backup and restoring in the hypervisor again. Same problem.

It's clearly a hypervisor pendrive problem, or it could be other thing?

We don't know what to do now...

Thanks a lot.

0 Kudos
5 Replies
daphnissov
Immortal
Immortal

We don't know what to do now...

Open a support request with GSS and provide logs.

0 Kudos
dssystems
Contributor
Contributor

Hi,

Thanks for your response daphnissov

Our service support is expired. (you know, in that difficult times, we need to reduce costs, and never we needed to call support until now).

That's the reason we are asking to the community, we don't know who to fix this, because when we apply a correction, and we need to wait 30 - 40 days to verify it has been fixed. And clearly, is not fixed...

In that point, we have restarted completely the server (management services restart doesn't fix the problem), and the server was now operative again. In the log before restart, it appears the boot was damaged (Bootbank cannot be found at path '/bootbank'), but we don't know why, it has restarted correctly. We don't know if this is a problem of the motherboard of the server, a problem of the pendrive, or another thing. 

We need to ask for help, to know the opinion of experts like you to find a solutions.

Thanks for your time.

0 Kudos
Alex_Romeo
Leadership
Leadership

Hi,

I agree with the previous answer, however it could also be a hardware problem, I advise you to open a ticket in Dell for verification.

ARomeo

Blog: https://www.aleadmin.it/
0 Kudos
dssystems
Contributor
Contributor

Hi @AlessandroRomeo68

Thanks for your answer. We have make a complete diagnostics test with the Lifecycle controller of the server, and everything appears to be OK. We are going to change the pendrive, reinstall de Dell ESXi image, restore config and wait...

If it will happen again, we will have to call VMware and ask if there is the option of pay only for resolve this ticket. Do you know if we can pay only to get support for one case?

Really, thanks for your answers.

0 Kudos
Alex_Romeo
Leadership
Leadership

0 Kudos