Hi
For a customer I just installed a vSphere 5.1 essentials environment. Two IBM x3650M4 hosts connected over ONE path to an IBM DS3512 storage. Soon a second HBA will be added to the configuration and we'll have two paths.
Before installing I checked the BIOS versions of the ESXi hosts. According to the VMware HCL, the BIOS version should be IBM-[VVE112H]. According to the IMM, the hosts are running
UEFI (Active) 1.21 VVE120EUS 11 Oct 2012
The problem we're experiencing is that all disk actions are very slow. In the vmkernel.log I can see a lot of path failovers that would explain why the system is slow. (See logs at end of this post).
When searching for these errors, I found this community post: http://communities.vmware.com/thread/341512
They refer to this VMware KB http://kb.vmware.com/kb/1030265
In that KB there is the following note:
"Note: This issue only applies if you see this specific alert in the vmkernel/messages log files: ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40. If you do not see this message, you are not experiencing this issue."
Questions:
- We can't find that message (ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40) in the vmkernel logs. Is this enough to state this KB does not apply?
- What is the impact if we DO apply the recommended solution on the ESXi 5.1 hosts?
Regards
Gabrie
Below is the vmkernel.log:
Hi,
What HBAs are you using? Are you using IBM provided drivers or VMware in-box drivers?
vmhba2 mpt2sas link-n/a sas.500605b005665230 (0:27:0.0) LSI Logic / Symbios Logic LSI2008
# vmkload_mod -s mpt2sas
vmkload_mod module information
input file: /usr/lib/vmware/vmkmod/mpt2sas
Version: Version 10.00.00.00.5vmw, Build: 799733, Interface: 9.2 Built on: Aug 1 2012
License: GPL
Required name-spaces:
com.vmware.driverAPI#9.2.1.0
com.vmware.vmkapi#v2_1_0_0
Parameters:
heap_max: int
Maximum attainable heap size for the driver.
heap_initial: int
Initial heap size allocated for the driver.
max_sectors: short
max sectors, range 64 to 8192 default=8192
max_lun: int
max lun, default=16895
command_retry_count: int
Device discovery TUR command retry count: (default=144)
logging_level: int
bits for enabling additional logging info (default=0)
mpt2sas_raid_queue_depth: int
Max RAID Device Queue Depth (default=128)
mpt2sas_sata_queue_depth: int
Max SATA Device Queue Depth (default=32)
mpt2sas_sas_queue_depth: int
Max SAS Device Queue Depth (default=254)
disable_discovery: int
disable discovery
mpt2sas_fwfault_debug: int
enable detection of firmware fault and halt firmware - (default=0)
diag_buffer_enable: int
post diag buffers (TRACE=1/SNAPSHOT=2/EXTENDED=4/default=0)
missing_delay: array of int
device missing delay , io missing delay
msix_disable: int
disable msix routed interrupts (default=0)
max_sgl_entries: int
max sg entries
max_queue_depth: int
max controller queue depth (default=600)
IBM Support told us the single HBA connected directly to the DS3512 is not the correct configuration. There should be a switch in between them. We've decided to try the iSCSI route.
Hi, I've put some of these boxes, without problems (and no switch when using SAS). Just know that IBM dose not support DS35xx with firmware 7.83... only 7.84.xx or greater....when using vSphere 5.1. Also have a look at this KB from vmware kb.vmware.com/kb/2039608
Hi, I've put some of these boxes, without problems (and no switch when using SAS). Just know that IBM dose not support DS35xx with firmware 7.83... only 7.84.xx or greater....when using vSphere 5.1. Also have a look at this KB from vmware kb.vmware.com/kb/2039608
Sorry for the double post, was on with my old account.
Hi
It is on firmware 7.84, I checked this with the VMware HCL.
Can you explain how you connected the box? Currently each host has ONE hba connected over SAS to the DS3512. After connecting the SATP is showing LSI and the PSP is MRU, where the VMware documentation suggests using ALUA, but the supplier of the DS3512 didn't know how to enable this on the LUNs.
Gabrie
I assume that you have created a hotsgroup the DS3512? then you can add a host to this group. This is also where you select the OS this is VMWARE and not ALUA. Physical put you'r hbaport1 in ds3512 controller port 1 and so on, do one at a time, so you do not mixup the WWN.
Think we found the problem. Host1 was connected through SAS on Storage Processer A and Host2 connected to Storage Processor B. It seems it works like with EMC storage and the LUNs kept thresspassing. Once we connected host2 to Storage Processer A the performance was back and there were no more messages in the vmkernel log.
Stupid we didn't think of this before because I usually do check this with EMC storage. Just not used to IBM SAS Storage.
Thank you for your help.
Great you got it working Normally I setup host1 HBA1 to controller A port 1 and host1 HBA1 to controller B port1 and it work fine with no errors.