VMware Cloud Community
GovM
Contributor
Contributor

P2000 G3 FC Disk Latency ESXi 5.5

Hi All,

First and foremost hello to everyone in the VMware community, new here and this my first post and i am hoping someone will be able to assist me as i am at wits end.

Our Setup

Hosts : DL380 Gen8's running esxi 5.5

HBA's : Emulex Lpe12000 8GB

Switch : HP SAN 8/8 Fabric switch

SAN : P2000 G3 FC ( 3 Enclosures with 10 x 900GB 10K rpm disks in each )

I have noticed high disk latency on a couple of our servers, for example on a SQL server running 1 DB the drive holding that DB is generating latency of 60ms, we have another server where we a have raw lun mapped for file share purposes and writing back to that lun is generating latency of around 60ms+

Hardware acceleration is enabled along with storage I/O. We are using raid 5 Vdisks with a couple of small 2TB volumes approx 3 presented to the ESXi hosts to form a storage cluster.

I've tried everything to reduce the latency from ensuring there is adequate ram/cpu that the disks are eager zero and the drives are split out using multi scsi controllers etc....

I've also done the following as per HP's recommended best practices for the HP P2000 G3 SAN

Change default PSP path to Round Robin for HP P2000 G3

esxcli storage nmp satp set --default-psp=VMW_PSP_RR --satp=VMW_SATP_ALUA

Next ran this to set the psp for all existing volumes to round robin.

for i in `esxcli storage nmp device list | grep naa.600` ; do esxcli storage nmp device set --device $i --psp VMW_PSP_RR; done

Finally set the path-change-frequency from 1000 IOps (default) to 1 so every IO the other (optimized) path is used

for i in `esxcli storage nmp device list | grep naa.600` ; do esxcli storage nmp psp roundrobin deviceconfig set -t iops -I 1 -d $i; done


This has made no difference to the latency

Also installed the latest VAAI.

I've taken some random  snippets from two hosts

The VM Kernel logs for host 2 are reporting

2014-01-14T20:18:45.807Z cpu8:33531)WARNING: Migrate: 262: Invalid message type for new connection: 542393671.  Expecting message of type INIT (0).

2014-01-14T20:19:01.661Z cpu12:34002)Config: 346: "HostLocalSwapDirEnabled" = 0, Old Value: 0, (Status: 0x0)

2014-01-14T20:21:02.771Z cpu6:32811)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e803e8cc0, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2014-01-14T20:21:02.771Z cpu6:32811)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:21:02.771Z cpu6:32811)ScsiDeviceIO: 2337: Cmd(0x412e803e8cc0) 0x1a, CmdSN 0x21cb from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2014-01-14T20:21:02.787Z cpu6:32811)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e803e8cc0, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2014-01-14T20:21:02.806Z cpu4:32809)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e803e8cc0, 0) to dev "mpx.vmhba35:C0:T0:L0" on path "vmhba35:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2014-01-14T20:21:02.806Z cpu4:32809)ScsiDeviceIO: 2337: Cmd(0x412e803e8cc0) 0x1a, CmdSN 0x21cf from world 0 to dev "mpx.vmhba35:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2014-01-14T20:26:02.773Z cpu3:32808)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e82d28680, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2014-01-14T20:26:02.773Z cpu3:32808)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:26:02.773Z cpu3:32808)ScsiDeviceIO: 2337: Cmd(0x412e82d28680) 0x1a, CmdSN 0x21e8 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2014-01-14T20:26:02.792Z cpu3:32808)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e82d28680, 0) to dev "mpx.vmhba35:C0:T0:L0" on path "vmhba35:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2014-01-14T20:26:02.792Z cpu3:32808)ScsiDeviceIO: 2337: Cmd(0x412e82d28680) 0x1a, CmdSN 0x21ee from world 0 to dev "mpx.vmhba35:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

Host 5 is reporting

014-01-14T20:17:00.194Z cpu15:32820)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x2a failed <3/23> sid x010400, did x010000, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:03.400Z cpu16:33500)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x2a failed <1/3> sid x010400, did x010800, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.242Z cpu10:33500)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.242Z cpu10:33322)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.242Z cpu10:33322)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.242Z cpu10:33500)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.242Z cpu10:33322)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.242Z cpu10:33322)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.243Z cpu10:33322)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.243Z cpu10:33322)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:17:35.245Z cpu10:33500)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x28 failed <0/4> sid x010400, did x010900, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:18:09.402Z cpu1:32800)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd x2a failed <1/3> sid x010400, did x010800, oxid xffff SCSI Reservation Conflict -

2014-01-14T20:20:42.407Z cpu6:32811)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 1:(0):3271: FCP cmd xa3 failed <0/1> sid x010400, did x010900, oxid xffff SCSI Chk Cond - Unit Attn: Data(x2:x6:x3f:xe)

2014-01-14T20:20:42.409Z cpu23:514490)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 0:(0):3271: FCP cmd xa3 failed <3/2> sid x010400, did x010900, oxid xffff SCSI Chk Cond - Unit Attn: Data(x2:x6:x3f:xe)

2014-01-14T20:20:42.409Z cpu23:514490)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 0:(0):3271: FCP cmd xa3 failed <3/1> sid x010400, did x010900, oxid xffff SCSI Chk Cond - Unit Attn: Data(x2:x6:x3f:xe)

2014-01-14T20:20:42.409Z cpu23:514490)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 0:(0):3271: FCP cmd xa3 failed <3/8> sid x010400, did x010900, oxid xffff SCSI Chk Cond - Unit Attn: Data(x2:x6:x3f:xe)

2014-01-14T20:20:42.409Z cpu23:33320)lpfc: lpfc_scsi_cmd_iocb_cmpl:2145: 0:(0):3271: FCP cmd xa3 failed <3/0> sid x010400, did x010900, oxid xffff SCSI Chk Cond - Unit Attn: Data(x2:x6:x3f:xe)

2014-01-14T20:20:42.414Z cpu19:32824)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412fc00128c0, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2014-01-14T20:20:42.414Z cpu19:32824)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:20:42.414Z cpu19:32824)ScsiDeviceIO: 2337: Cmd(0x412fc00128c0) 0x1a, CmdSN 0x888e from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2014-01-14T20:25:42.423Z cpu21:32826)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412fc6702c80, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x4 0x0 0x0. Act:EVAL

2014-01-14T20:25:42.423Z cpu21:32826)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:25:42.423Z cpu21:32826)ScsiDeviceIO: 2337: Cmd(0x412fc6702c80) 0x1a, CmdSN 0x88aa from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x4 0x0 0x0.

2014-01-14T20:30:42.432Z cpu21:32826)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412fc42be600, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2014-01-14T20:30:42.432Z cpu21:32826)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:30:42.432Z cpu21:32826)ScsiDeviceIO: 2337: Cmd(0x412fc42be600) 0x1a, CmdSN 0x88c2 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2014-01-14T20:32:07.873Z cpu11:32816)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e8a0d9ec0, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2014-01-14T20:32:07.873Z cpu11:32816)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:32:07.873Z cpu11:32816)ScsiDeviceIO: 2337: Cmd(0x412e8a0d9ec0) 0x1a, CmdSN 0x88cc from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2014-01-14T20:32:07.891Z cpu11:32816)ScsiDeviceIO: 2337: Cmd(0x412e8a0d9ec0) 0x1a, CmdSN 0x88cd from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2014-01-14T20:35:42.441Z cpu19:32824)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412fc6531c00, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2014-01-14T20:35:42.441Z cpu19:32824)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:35:42.441Z cpu19:32824)ScsiDeviceIO: 2337: Cmd(0x412fc6531c00) 0x1a, CmdSN 0x88db from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2014-01-14T20:40:42.447Z cpu12:32817)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412fc23d6200, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x0. Act:EVAL

2014-01-14T20:40:42.447Z cpu12:32817)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:40:42.447Z cpu12:32817)ScsiDeviceIO: 2337: Cmd(0x412fc23d6200) 0x1a, CmdSN 0x88f1 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x0.

2014-01-14T20:45:42.457Z cpu19:32824)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412fc36e9240, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2014-01-14T20:45:42.457Z cpu19:32824)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "mpx.vmhba32:C0:T0:L0" state in doubt; requested fast path state update...

2014-01-14T20:45:42.457Z cpu19:32824)ScsiDeviceIO: 2337: Cmd(0x412fc36e9240) 0x1a, CmdSN 0x8907 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Attached some pictures

Latency 1 : this is a server using VMDK's only no RDM's

Latency 2 : this server has an RDM attached (scsi1:0) latency is the same

I've also updated on host 2 the HBA firmware to Firmware 2.01a12 which is the latest one available for the LPE12000 this again has made no difference

The SAN management controllers are running the latest firmware available also

latency1.JPGlatency2.JPGhba.JPG p2000_management.JPGp2000_vdisk_overview.JPG

Any help would be greatly appreciated please.

0 Kudos
8 Replies
eyal_reduxio
Contributor
Contributor

Hi,

What is your RAID configuration on the P2000? Is your datastore spread across all disks? Ideally you'd like to have a datastore across all of your spindles.

One idea I can think of is for you to Storage vMotion one of VMs suffering the latency to the local disks of the ESXi servers to see if it helps.

Eyal Traitel

Director of Marketing

Reduxio Systems, http://www.reduxio.com

c: +972-58-7300-749

e: eyal@reduxio.com

0 Kudos
bl0ckh3ad
Enthusiast
Enthusiast

Hi GovM

Just curious of you ever got this resolved?

I'm seeing a similar issue and suspecting HBA issue.

0 Kudos
raldridge254
Enthusiast
Enthusiast

Did you ever get to the bottom of this?

I've having the exact same problem.  Last time it resulted in the hostd becoming unresponsive and having to restart the server by ILO.

So anything you found would be great!

0 Kudos
bl0ckh3ad
Enthusiast
Enthusiast

Firmware issue with the Emulex hba’s

I’d recommend opening a case with Emulex and request to escalate to engineering for review

0 Kudos
levonti
Contributor
Contributor

Do you have any updates of your problem?

I have a similar problem with HP 3PAR and HP Proliant 460 G7/G8:

2015-02-05T10:32:14.430Z cpu11:98640)lpfc: lpfc_scsi_cmd_iocb_cmpl:2185: 0:(0):3271: FCP cmd x2a failed <5/2> sid x080c30, did x060600, oxid xba iotag x457 Time Out Returning Host Busy

2015-02-05T10:32:14.430Z cpu10:33578)NMP: nmp_ThrottleLogForDevice:2322: Cmd 0x2a (0x412e403dd540, 272365) to dev "naa.60002ac000000000000000040000561e" on path "vmhba0:C0:T5:L2" Failed: H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0. Act:EVAL

2015-02-05T10:32:14.430Z cpu10:33578)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60002ac000000000000000040000561e" state in doubt; requested fast path state update...

2015-02-05T10:32:14.430Z cpu10:33578)ScsiDeviceIO: 2325: Cmd(0x412e403dd540) 0x2a, CmdSN 0x800e0038 from world 272365 to dev "naa.60002ac000000000000000040000561e" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2015-02-05T10:32:14.432Z cpu11:98640)lpfc: lpfc_scsi_cmd_iocb_cmpl:2185: 0:(0):3271: FCP cmd x2a failed <5/2> sid x080c30, did x060600, oxid xbe iotag x45b Time Out Returning Host Busy

2015-02-05T10:32:14.432Z cpu10:33578)ScsiDeviceIO: 2325: Cmd(0x412e4042cb80) 0x2a, CmdSN 0x800e0017 from world 272365 to dev "naa.60002ac000000000000000040000561e" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

2015-02-05T10:32:14.439Z cpu11:98640)lpfc: lpfc_scsi_cmd_iocb_cmpl:2185: 0:(0):3271: FCP cmd x2a failed <3/2> sid x080c30, did x060700, oxid xcc iotag x469 Time Out Returning Host Busy

2015-02-05T10:32:14.539Z cpu11:98640)lpfc: lpfc_scsi_cmd_iocb_cmpl:2185: 0:(0):3271: FCP cmd x2a failed <3/2> sid x080c30, did x060700, oxid x199 iotag x536 Time Out Returning Host Busy

in this time on the vm host iotop shows:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util

sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     3.00    0.00   0.00   100.00

and we have freeze on the system.:smileyangry:

0 Kudos
levonti
Contributor
Contributor

No, I know about this bug and we have previously updated lpfc driver to version 10.2.455.0.

My problem was in the IO, KBPS and Queue length spikes on the reports with 2 seconds interval.

We found the host that generated the burst of IO activity and limited its. The problem with IO pauses on VM and Time Out Returning Host Busy on the ESXi ended for me.

0 Kudos
LukaszWarchol
Contributor
Contributor

Hello,

We had same issue. After few hours of searching the cause, trying different solutions, etc, we have found out that we had zoning issue from storage side. As in our case storage is ALUA, it requires 2 paths per HBA (or more but all double). We had only 1 and it was causing that all hosts were looking for 2nd path. After some time of asking for this 2nd path ESXi started to be overloaded and freeze... Adding correct pathing for ALUA requirements (multipathing was RR from our side) resolved our issue.

I don't how it was in your case, but it might me worth to check or helpful for other persons.

0 Kudos