shakeebkp
Contributor
Contributor

ESX6.7: VM with RDM is taking long time to boot.

I have a linux VM deployed in ESXi6.7 host and i have mapped few SAMSUNG drives as RDM using SCSI Adapter(LSI Logical SAS). Looks like the VM is taking long time to boot and it was stuck on the BIOS splash screen.

console_delay.PNG

I have seen many threads explains about the same issue in ESX4 or below. And all those issues should has been fixed in later versions.

I have tried the following tests and it succeeded.

  1. I have changed the SCSI controller type from "LSI Logical SAS" to "VMware Paravirtual" and it booted up immediately.
  2. I have changed the SCSI controller type from "LSI Logical SAS" to "LSI Logic Parallel" and it booted up immediately.
  3. Installed the ESX 6.5 and booted the VM with "LSI Logical SAS" itself. That time also the VM booted successfully.

So looks like the issue is only with ESXi6.7 and LSI Logical SAS.

The AHCI versions are

    #esxcli software vib list | grep ahci

    sata-ahci                      3.0-26vmw.650.1.26.5969303            VMW     VMwareCertified   2017-11-01

    vmw-ahci                       1.0.0-39vmw.650.1.26.5969303          VMW     VMwareCertified   2017-11-01

I am seeing the following logs in vmkernel.log

    2018-07-05T18:09:59.268Z cpu46:2098124)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.5002538c4080b3f0" state in doubt; requested fast path state update...

    2018-07-05T18:10:09.265Z cpu45:2098124)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.5002538c4080b3f0" state in doubt; requested fast path state update...

    2018-07-05T18:10:19.266Z cpu35:2098124)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.5002538c4080b3f0" state in doubt; requested fast path state update...

    2018-07-05T18:10:29.267Z cpu28:2098124)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.5002538c4080b3f0" state in doubt; requested fast path state update...

    2018-07-05T18:10:39.266Z cpu28:2098124)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.5002538c4080b3f0" state in doubt; requested fast path state update...

    2018-07-05T18:10:49.267Z cpu28:2098124)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.5002538c4080b3f0" state in doubt; requested fast path state update...

    .....

    2018-07-09T02:34:59.241Z cpu26:2097245)ScsiDeviceIO: 3015: Cmd(0x45a24f339e00) 0x1a, CmdSN 0x168201 from world 0 to dev "t10.NVMe____MTFDHAX800MCE2D1AN1ZABYY_________________006E72380300E0CF" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2018-07-09T03:09:59.295Z cpu4:2097223)ScsiDeviceIO: 3015: Cmd(0x45a24f30d1c0) 0x1a, CmdSN 0x169e52 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    2018-07-09T04:14:59.258Z cpu24:2097243)ScsiDeviceIO: 3015: Cmd(0x45a24f22be00) 0x1a, CmdSN 0x16d268 from world 0 to dev "t10.NVMe____MTFDHAX800MCE2D1AN1ZABYY_________________006E72380300E0CF" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2018-07-09T04:49:59.314Z cpu2:2097221)ScsiDeviceIO: 3015: Cmd(0x45a24f29abc0) 0x1a, CmdSN 0x16ef30 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    2018-07-09T05:54:59.276Z cpu24:2097243)ScsiDeviceIO: 3015: Cmd(0x45a24f3e7c80) 0x1a, CmdSN 0x17222a from world 0 to dev "t10.NVMe____MTFDHAX800MCE2D1AN1ZABYY_________________006E72380300E0CF" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2018-07-09T06:29:59.331Z cpu2:2097221)ScsiDeviceIO: 3015: Cmd(0x45a25ef6f500) 0x1a, CmdSN 0x173efe from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    .......

    2018-07-09T02:34:59.241Z cpu26:2097245)ScsiDeviceIO: 3015: Cmd(0x45a24f339e00) 0x1a, CmdSN 0x168201 from world 0 to dev "t10.NVMe____MTFDHAX800MCE2D1AN1ZABYY_________________006E72380300E0CF" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2018-07-09T03:09:59.295Z cpu4:2097223)ScsiDeviceIO: 3015: Cmd(0x45a24f30d1c0) 0x1a, CmdSN 0x169e52 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    2018-07-09T04:14:59.258Z cpu24:2097243)ScsiDeviceIO: 3015: Cmd(0x45a24f22be00) 0x1a, CmdSN 0x16d268 from world 0 to dev "t10.NVMe____MTFDHAX800MCE2D1AN1ZABYY_________________006E72380300E0CF" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2018-07-09T04:49:59.314Z cpu2:2097221)ScsiDeviceIO: 3015: Cmd(0x45a24f29abc0) 0x1a, CmdSN 0x16ef30 from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    2018-07-09T05:54:59.276Z cpu24:2097243)ScsiDeviceIO: 3015: Cmd(0x45a24f3e7c80) 0x1a, CmdSN 0x17222a from world 0 to dev "t10.NVMe____MTFDHAX800MCE2D1AN1ZABYY_________________006E72380300E0CF" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

    2018-07-09T06:29:59.331Z cpu2:2097221)ScsiDeviceIO: 3015: Cmd(0x45a25ef6f500) 0x1a, CmdSN 0x173efe from world 0 to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

    ......

    2018-07-06T09:02:17.195Z cpu44:2098124)NMP: nmp_ThrottleLogForDevice:3618: last error status from device naa.5002538c4080b3f5 repeated 160 times

    2018-07-06T09:02:17.997Z cpu44:2098124)NMP: nmp_ThrottleLogForDevice:3618: last error status from device naa.5002538c4080b3f5 repeated 320 times

    2018-07-06T09:02:19.601Z cpu44:2098124)NMP: nmp_ThrottleLogForDevice:3618: last error status from device naa.5002538c4080b3f5 repeated 640 times

    2018-07-06T09:02:22.811Z cpu44:2098124)NMP: nmp_ThrottleLogForDevice:3618: last error status from device naa.5002538c4080b3f5 repeated 1280 times

    2018-07-06T09:02:29.227Z cpu32:2098124)NMP: nmp_ThrottleLogForDevice:3618: last error status from device naa.5002538c4080b3f5 repeated 2560 times

    2018-07-06T09:02:42.058Z cpu44:2098124)NMP: nmp_ThrottleLogForDevice:3618: last error status from device naa.5002538c4080b3f5 repeated 5120 times

Any one else is facing this issue and is there any workaround to get rid of this issue?

Thanks

Shakeeb

0 Kudos
1 Reply
shakeebkp
Contributor
Contributor

The system has one NVME disk and that is the reason it is getting the "t10.NVMe____MTFDHAX800MCE2D1AN1ZABYY_________________006E72380300E0CF..." message. All other drives SAMSUNG SSD Drives.

There are SCSI error on SSD drives also.

      2018-07-05T18:09:38.351Z cpu2:2097221)ScsiDeviceIO: 3029: Cmd(0x459a410abd80) 0x12, CmdSN 0x6a0af from world 0 to dev "naa.5002538c4080b338" failed H:0x3 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.

      2018-07-05T18:09:58.268Z cpu43:2098124)ScsiDeviceIO: 2994: Cmd(0x45a2408b0900) 0x12, CmdSN 0x6a1a5 from world 0 to dev "naa.5002538c4080b338" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x65 0x6e.

      2018-07-05T18:09:58.297Z cpu46:2098124)ScsiDeviceIO: 2994: Cmd(0x45a2408b0900) 0x12, CmdSN 0x6a1a7 from world 0 to dev "naa.5002538c4080b3f0" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x73 0x65 0x6e.

      2018-07-05T18:10:38.298Z cpu28:2097247)ScsiDeviceIO: 3029: Cmd(0x45a2408b0900) 0x12, CmdSN 0x6a1a7 from world 0 to dev "naa.5002538c4080b3f0" failed H:0x3 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.

      2018-07-05T18:10:38.299Z cpu28:2098124)ScsiDeviceIO: 2994: Cmd(0x45a2408b0900) 0x12, CmdSN 0x6a1a8 from world 0 to dev "naa.5002538c4080b3f0" failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x2 0x1a 0x45.

I am not sure that all these logs has any relation with our issue. Still i am attaching this for your reference,

And the achi version i have mentioned in the issue is for ESX6.5.

AHCI version for ESX6.5

     esxcli software vib list | grep ahci

    sata-ahci                      3.0-26vmw.650.1.26.5969303            VMW     VMwareCertified   2017-11-01

    vmw-ahci                     1.0.0-39vmw.650.1.26.5969303          VMW     VMwareCertified   2017-11-01

AHCI version for ESX6.7

   esxcli software vib list | grep ahci

   sata-ahci                      3.0-26vmw.670.0.0.8169922             VMW     VMwareCertified   2018-06-12

   vmw-ahci                     1.2.0-6vmw.670.0.0.8169922            VMW     VMwareCertified   2018-06-12

I have tried disabling the vmw-ahci mentioned like in https://www.virtuallifestyle.nl/2017/05/fixing-issues-rdms-upgrading-esxi-6-5-0d/  and this also did not solve the issue.

0 Kudos