Environment:
Hello and welcome to the forums.
Note: This discussion was moved from the VMware ESXi 4 community to the VMware vSphere Storage community.
Good Luck!
I thought I would have more replies on this! But I guess it has been beaten to death or maybe my posting style is no good.... Anyways, I was looking at some different things to try and find an issue with my storage config and I noticed something in the esx.conf that seemed wierd to me.
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000e6]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000e6]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000e8]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000e8]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000ea]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000ea]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000f0]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000f0]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000f3]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000f3]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000f6]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000000f6]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000126]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000126]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000138]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000138]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b000000000000013b]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b000000000000013b]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b000000000000013e]/preferred = "iqn.1998-01.com.vmware:localhost-5c1cc3b0-00023d000001,iqn.2003-10.com.lefthandnetworks:companynamehere:318:backuptodisk,t,1-naa.6000eb3891805a5b000000000000013e"
/storage/plugin/NMP/device[naa.6000eb3891805a5b000000000000013e]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b000000000000013e]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000141]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000141]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000223]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000223]/rrIops = "1"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000002d8]/preferred = "iqn.1998-01.com.vmware:localhost-5c1cc3b0-00023d000004,iqn.2003-10.com.lefthandnetworks:companynamehere:728:backuptodiskdb,t,1-naa.6000eb3891805a5b00000000000002d8"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000002d8]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000002db]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000002de]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000002e0]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000002e2]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000002e4]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b00000000000002e6]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000772]/psp = "VMW_PSP_RR"
/storage/plugin/NMP/device[naa.6000eb3891805a5b0000000000000772]/rrIops = "1"
/storage/swIscsi/enabled = "true"
The lines in green represent the device that is in fact listed in my 3 host cluster on each host. The one in red is not. Could it be possible that Esxi is trying to look at that device somehow even though it doesn't exist when I look in vCenter? Also, should I have any "preferred" lines in the file at all since I am using RoundRobin as my psp?
Thanks in advance for any insight or feedback you can give!
We have recently purchased an 8 node Lefthand P4500 SAN. We configured our environment similarly to yours with a few exceptions:
* We did not decide to use Jumbo Frames as we would if our SAN fabric was 10gb Ethernet
* We left the IOPS patch changeover setting at 1000 IOPS instead of changing it to 1 as you have
Duncan Epping (Yellow-Bricks blog) talked about the IOPS=1 setting on a blog entry about 17 months ago at
http://www.yellow-bricks.com/2010/03/30/whats-the-point-of-setting-iops1/
* We left the default iSCSI Port Group settings alone and did not set a load balancing policy exception for IP Hash as you did.
* We are thinking about moving our SQL server to the Lefthand (it is on a Proliant server using DAS right now), but haven't done it yet.
The HP Assessing Performance in Lefthand SANs whitepaper talks about identifying bottlenecks in MS SQL environments on page 6 using the Microsoft SQLIOStress application.
HP's White Paper is at: http://h20195.www2.hp.com/v2/GetPDF.aspx/c01770507.pdf
"4 teamed NICs for iSCSI" doesn't sound like Multipathing.
http://fojta.wordpress.com/2010/04/13/iscsi-and-esxi-multipathing-and-jumbo-frames/
Have a look there.
The "route by IP hash" should not be relevant after following that guide.
The following documents may be of some assistance to you.
I would have thought that 2 x 1GB NIC would have been enough for your storage network per host.
Configuring LeftHand with vSphere implementation document:
http://www.scribd.com/doc/24586958/Configuring-Left-Hand-ISCSI-and-VSPHERE-MPIO
Multi-vendor iSCSI SAN with vSphere implementation document:
I guess I assumed that when you said 4 teamed NICS for iSCSI, you meant 4 pNICs on 1 vSwitch, but with 4 VMkernel port groups running 1 active and 3 unused ... alternating through all 4 port groups?
Is that correct?
Matt
Hi Matt. Thanks for your response and sorry for my delay. Yes. The iscsi vswitch has 4 vmnics and 4 vmks. Each vmk has 1 vmnic listed as active in teaming and all other nics for that vmk are listed as unused. If we look at the next vmk, another vmnic is listed as active and the others are unused and so on. So it's a 1-1 relationship there. Using psp RoundRobin and IOPS are set to 1. After each iop, it's on to the next path. I have tried some different IOPS to see what would yield the best results and they all stay around 117mb/sec. I had a Cisco guy in yesterday to confirm the networking is correct. I have a support request open with vmware and this is what they said.
"Hello Jamey,
Thank you for your Support Request. in summary of our findings during the webex torubleshooting session: the 114 or so MBps you are getting with that test is to be expected, even with Round Robin load balancing. This is a multipathing technology to load balance multiple streams of IO, but it does not increase the throughput of any SINGLE stream of I/O. Any single read/write strem is still only going to move as fast as the line it is on. In this case, that line is 1Gbps, which equals out to about what you are getting (~120MBps).
True I/O increase through multipathing cannot be achieved through the technology we have implemented in ESX, but there are third party solutions. I do not know of any that are for the Lefthand SAN, but you could look into it. Another option, and most likely the more viable, would be to check if the SAN can handle 10gb NIC connections and upgrade to 10Gb NICs and connections to the SAN.
Please let me know if you need any further clarification regarding this matter. Thank you for utilizing VMware Technical Support!"
Here is my response to their determination.......
"Hi and thanks for your response. I have a couple questions.
First, I want to make sure I understand correctly.... There is no way I can get more than 112mb/sec throughput when I am using iscsi, 1gb nics in esxi no matter what version or configuration right?
Questions:
-Is this a limitation of my version of esxi 4.1(advanced)
-What guest OS iscsi iniators are supported by vmware? MS Iscsi iniator, HP DSM MPIO, etc...... I am using HP P4500 G2 SAN.
-If I use an iscsi initiator within the guest OS to get past the 112mb/sec, can I still use vMotion, HA and DRS?
-Using a third party guest os based iscsi initiator for local vm disks for a sql server, will I be able to get more than the 112mb/sec throughput cap I can get via native multipathing in esxi?
Thanks in advance for your help."
Does anyone have any other info or input? If we can't get over the 117mb/sec throughput we are going to look at a higher power solution(10gb fiber channel).
Thanks again for your input and info!
I wanted to follow this up with a status post. After reviewing the original resource utilization report that the vendor used to base their virtual infrastructure hardware recommendation on, we found that they put the peak IO throughput was 14mb/sec. I have perfmon logs back to a year ago that show our IO profile as having spikes of up to 400mb/sec during regular hours and up to 900mb/sec during backups. So all hardware they recommended, 4 HP LeftHand P4500 nodes (12, 600gb, 15k, sas disks, two 1gb nics), 1gb switches and 1gb nics in the hosts was way under what we need for acceptable performance in a virtual environment.
A vmware engineer elaborated on how round robin works and told me that the IO throughput is limited to the 1gb nic PER LUN even though we have 6 nics in each host.
My Tests:
-I used SQLIO to do a sequential read against one disk that was sitting on one lun that was presented to the VM through ESXi. The throughput was around 130mb/sec.
-I then used SQLIO to do the same sequential read test on a disk that was comprised of many volumes and many disks. Here is what I did to test this.
1. Create ten raid 10 volumes on the lefthand san
2. Create the datastores
3. Create a virtual disk on each datastore
4. Go into the VM and use windows disk manager to create a striped volume over all of the 10 disks
The most disks I added to the striped volume was 20 and I seen 490mb/sec. I would never have any production sql server data\logs sitting on a volume like this. This was strictly for POC.
Long story long, after the vendor seen our real numbers they came back and said they do not recommend putting our SQL Server in a virtual environment. They now would like to cluster it over two physical hosts and use the DSM MPIO for the LeftHand to get the IO. I believe it is possible for sure but it would require a more powerful storage network across the SAN, Switches and hosts.
It has been a journey and I have learned a TON! All of this started from a little mistake on a little IO profile report!