Gabrie1
Commander
Commander

VMs very slow on IBM x3650 M4 with ESXi 5.1

Hi

For a customer I just installed a vSphere 5.1 essentials environment. Two IBM x3650M4 hosts connected over ONE path to an IBM DS3512 storage. Soon a second HBA will be added to the configuration and we'll have two paths.

Before installing I checked the BIOS versions of the ESXi hosts. According to the VMware HCL, the BIOS version should be  IBM-[VVE112H]. According to the IMM, the hosts are running

UEFI (Active)    1.21    VVE120EUS    11 Oct 2012

The problem we're experiencing is that all disk actions are very slow. In the vmkernel.log I can see a lot of path failovers that would explain why the system is slow. (See logs at end of this post).

When searching for these errors, I found this community post: http://communities.vmware.com/thread/341512

They refer to this VMware KB http://kb.vmware.com/kb/1030265

In that KB there is the following note:

"Note: This issue only applies if you see this specific alert in the vmkernel/messages log files: ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40. If you do not see this message, you are not experiencing this issue."

Questions:

- We can't find that message (ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40) in the vmkernel logs. Is this enough to state this KB does not apply?

- What is the impact if we DO apply the recommended solution on the ESXi 5.1 hosts?

Regards

Gabrie

Below is the vmkernel.log:

2013-02-01T17:05:30.373Z cpu22:8214)NMP: nmpCompleteRetryForPath:321: Retry world recovered device "naa.60080e50002fb706000002635107aa63"
2013-02-01T17:05:30.796Z cpu15:8374)NMP: nmp_DeviceUpdatePathStates:615: Activated path "vmhba2:C0:T0:L1" for NMP device "naa.60080e50002f7b320000028c5107a9ff".
2013-02-01T17:05:30.797Z cpu0:10554)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device "naa.60080e50002f7b320000028c5107a9ff" - issuing command 0x412401f91300
2013-02-01T17:05:30.797Z cpu10:13686)NMP: nmpCompleteRetryForPath:321: Retry world recovered device "naa.60080e50002f7b320000028c5107a9ff"
2013-02-01T17:05:32.520Z cpu20:217782)VMW_SATP_LSI: satp_lsi_pathFailure:1120: Command 0x8a to naa.60080e50002fb706000002635107aa63 (fcf 0) failed with NOT_READY (0x2/0x4/0x1), on path vmhba2:C0:T0:L2 (pnr 1, iet 0xac848a2)
2013-02-01T17:05:32.520Z cpu20:217782)ScsiDeviceIO: 2303: Cmd(0x4124403dcb00) 0x8a, CmdSN 0x8000002c from world 217778 to dev "naa.60080e50002fb706000002635107aa63" failed H:0x0 D:0x2 P:0x4 Possible sense data: 0x2 0x4 0x1.
2013-02-01T17:05:32.520Z cpu22:8214)ScsiDeviceIO: 2303: Cmd(0x4124403f8200) 0x8a, CmdSN 0x80000041 from world 217778 to dev "naa.60080e50002fb706000002635107aa63" failed H:0x0 D:0x2 P:0x4 Possible sense data: 0x2 0x4 0x1.
2013-02-01T17:05:33.039Z cpu22:8214)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x8a (0x4124403dcb00, 217778) to dev "naa.60080e50002fb706000002635107aa63" on path "vmhba2:C0:T0:L2" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x94 0x1. Act:FAILOVER
2013-02-01T17:05:33.039Z cpu22:8214)WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa.60080e50002fb706000002635107aa63": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
2013-02-01T17:05:33.039Z cpu22:8214)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x8a (0x4124403f8200, 217778) to dev "naa.60080e50002fb706000002635107aa63" on path "vmhba2:C0:T0:L2" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x94 0x1. Act:FAILOVER
2013-02-01T17:05:33.039Z cpu22:8214)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x8a (0x4124425c4e40, 217778) to dev "naa.60080e50002fb706000002635107aa63" on path "vmhba2:C0:T0:L2" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x94 0x1. Act:FAILOVER
2013-02-01T17:05:33.306Z cpu3:8195)NMP: nmp_ThrottleLogForDevice:2319: Cmd 0x2a (0x412402b82c00, 11035) to dev "naa.60080e50002f7b320000028c5107a9ff" on path "vmhba2:C0:T0:L1" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x94 0x1. Act:FAILOVER
2013-02-01T17:05:33.306Z cpu3:8195)WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa.60080e50002f7b320000028c5107a9ff": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
2013-02-01T17:05:34.496Z cpu3:8375)NMP: nmp_DeviceUpdatePathStates:615: Activated path "vmhba2:C0:T0:L2" for NMP device "naa.60080e50002fb706000002635107aa63".
2013-02-01T17:05:34.497Z cpu1:8786)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device "naa.60080e50002fb706000002635107aa63" - issuing command 0x4124403dcb00
2013-02-01T17:05:34.506Z cpu22:8214)NMP: nmpCompleteRetryForPath:321: Retry world recovered device "naa.60080e50002fb706000002635107aa63"
2013-02-01T17:05:34.795Z cpu2:8369)NMP: nmp_DeviceUpdatePathStates:615: Activated path "vmhba2:C0:T0:L1" for NMP device "naa.60080e50002f7b320000028c5107a9ff".
2013-02-01T17:05:34.796Z cpu0:10554)WARNING: NMP: nmpDeviceAttemptFailover:599:Retry world failover device "naa.60080e50002f7b320000028c5107a9ff" - issuing command 0x412402b82c00
2013-02-01T17:05:34.796Z cpu4:8634)NMP: nmpCompleteRetryForPath:321: Retry world recovered device "naa.60080e50002f7b320000028c5107a9ff"
~ #
http://www.GabesVirtualWorld.com
Tags (5)
0 Kudos
9 Replies
ragmon
Enthusiast
Enthusiast

Hi,

What HBAs are you using? Are you using IBM provided drivers or VMware in-box drivers?

0 Kudos
Gabrie1
Commander
Commander

vmhba2  mpt2sas           link-n/a  sas.500605b005665230                    (0:27:0.0) LSI Logic / Symbios Logic LSI2008

# vmkload_mod -s mpt2sas

vmkload_mod module information

input file: /usr/lib/vmware/vmkmod/mpt2sas

Version: Version 10.00.00.00.5vmw, Build: 799733, Interface: 9.2 Built on: Aug  1 2012

License: GPL

Required name-spaces:

  com.vmware.driverAPI#9.2.1.0

  com.vmware.vmkapi#v2_1_0_0

Parameters:

  heap_max: int

    Maximum attainable heap size for the driver.

  heap_initial: int

    Initial heap size allocated for the driver.

  max_sectors: short

    max sectors, range 64 to 8192 default=8192

  max_lun: int

     max lun, default=16895

  command_retry_count: int

     Device discovery TUR command retry count: (default=144)

  logging_level: int

     bits for enabling additional logging info (default=0)

  mpt2sas_raid_queue_depth: int

     Max RAID Device Queue Depth (default=128)

  mpt2sas_sata_queue_depth: int

     Max SATA Device Queue Depth (default=32)

  mpt2sas_sas_queue_depth: int

     Max SAS Device Queue Depth (default=254)

  disable_discovery: int

     disable discovery

  mpt2sas_fwfault_debug: int

     enable detection of firmware fault and halt firmware - (default=0)

  diag_buffer_enable: int

     post diag buffers (TRACE=1/SNAPSHOT=2/EXTENDED=4/default=0)

  missing_delay: array of int

     device missing delay , io missing delay

  msix_disable: int

     disable msix routed interrupts (default=0)

  max_sgl_entries: int

     max sg entries

  max_queue_depth: int

     max controller queue depth (default=600)

http://www.GabesVirtualWorld.com
0 Kudos
Gabrie1
Commander
Commander

IBM Support told us the single HBA connected directly to the DS3512 is not the correct configuration. There should be a switch in between them. We've decided to try the iSCSI route.

http://www.GabesVirtualWorld.com
0 Kudos
gkn1
Contributor
Contributor

Hi, I've put some of these boxes, without problems (and no switch when using SAS). Just know that IBM dose not support DS35xx with firmware 7.83... only 7.84.xx or greater....when using vSphere 5.1. Also have a look at this KB from vmware kb.vmware.com/kb/2039608

Gert Kjerslev
0 Kudos
GFK
Enthusiast
Enthusiast

Hi, I've put some of these boxes, without problems (and no switch when  using SAS). Just know that IBM dose not support DS35xx with firmware  7.83... only 7.84.xx or greater....when using vSphere 5.1. Also have a  look at this KB from vmware kb.vmware.com/kb/2039608

Sorry for the double post, was on with my old account.

Gert Kjerslev | If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
Gabrie1
Commander
Commander

Hi

It is on firmware 7.84, I checked this with the VMware HCL.

Can you explain how you connected the box? Currently each host has ONE hba connected over SAS to the DS3512. After connecting the SATP is showing LSI and the PSP is MRU, where the VMware documentation suggests using ALUA, but the supplier of the DS3512 didn't know how to enable this on the LUNs.

Gabrie


http://www.GabesVirtualWorld.com
0 Kudos
GFK
Enthusiast
Enthusiast

I assume that you have created a hotsgroup the DS3512? then you can add a host to this group. This is also where you select the OS this is VMWARE and not ALUA. Physical put you'r hbaport1 in ds3512 controller port 1 and so on, do one at a time, so you do not mixup the WWN.

Gert Kjerslev | If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
Gabrie1
Commander
Commander

Think we found the problem. Host1 was connected through SAS on Storage Processer A and Host2 connected to Storage Processor B. It seems it works like with EMC storage and the LUNs kept thresspassing. Once we connected host2 to Storage Processer A the performance was back and there were no more messages in the vmkernel log.

Stupid we didn't think of this before because I usually do check this with EMC storage. Just not used to IBM SAS Storage.

Thank you for your help.

http://www.GabesVirtualWorld.com
0 Kudos
GFK
Enthusiast
Enthusiast

Great you got it working Smiley Happy Normally I setup host1 HBA1 to controller A port 1 and host1 HBA1 to controller B port1 and it work fine with no errors.

Gert Kjerslev | If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos