VMware Cloud Community
alapierre
Contributor
Contributor
Jump to solution

Slow Read/Write Performance over iSCSI SAN.

This is a new setup of ESXi 4.0 running VMs off of a Cybernetics miSAN D iSCSI SAN.

Doing a high data read test on a VM, it took 8 minutes vs 1.5 minutes on the

the same VM located on a slower VMWare Server 1.0 Host with the VMs located on

local disk. I'm watching my read speeds from the SAN, and

it's getting just over 3MB/s max read, and Disk Usage on the VM matches at just over 3MB/s....horribly slow.

The server and SAN are both connected to the same 1GB Switch. I have followed this guide

virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-w

ith-vmware-vsphere.html

to get multipathing setup properly, but I'm still not getting good

performance with my VMs. I know the SAN and the network

should be able to handle over 100MB/s, but I'm just not getting

it. I have two GB NICs on the SAN multipathed to two GB NICs on the

ESXi Host. One NIC per VMkernel. Is there something else I can check

or do to improve my speed? Thanks in advance for any tips.

Reply
0 Kudos
1 Solution

Accepted Solutions
J1mbo
Virtuoso
Virtuoso
Jump to solution

Another vote for IOMeter.

Try test 32K 100% sequential read (and then write) with 64 outstanding IOs, this will give sequential performance. Should be near to 100MB/s per active GigE path, depending on how much the storage system can push out.

Then 32K 0% sequential read (and then write) with 64 outstanding IOs against a good sized test LUN (say 4GB+) will give a value for IOPS, which is the main driving factor for virtualisation. Look at the latency, this needs to remain below about 50ms usually so you can work out whether the default 32 outstanding IOs (per host) is OK (say you had six hosts, the array would need to be able to deliver random IO with latency <50ms with 192 outstanding IOs (=32*6)).

Don't use the 'test connect rate' and this effectively tests only cached throughput, which we're not so interested in anyway.

Please award points to any useful answer.

View solution in original post

Reply
0 Kudos
35 Replies
J1mbo
Virtuoso
Virtuoso
Jump to solution

My advise would be to simplify everything as much as possible before attempting multipath. Use a single NIC (at both ends), disable jumbo frames throughout, create a new LUN and test that. As you say, a single GigE should be good for about 110MB/s anyway.

Please award points to any useful answer.

AnatolyVilchins
Jump to solution

That SAN hardware is certified for Vmware so

get your support to look into it. Common causes for bad performance are

overload the interface of the SAN hardware, because if you have

multiple connections to the same SAN not all can be served at the

maximum speed.

Also your local disk will always be faster than your SAN in your

setup, because even a SATA disk will have a maximum of 3Gb/s bandwidth,

so your SAN will never match the speed of your local disks. You

probably are also using ethernet instead of fibre which is also not

help performance.

You use a SAN not only because of the speed, but to have a central

managed place where you can put all your importante data and make sure

a suitable RAID level is being applied. There are also certain features

like replication which is one of the advantages of having a SAN.

from http://serverfault.com/questions/106352/slow-read-write-performance-over-iscsi-san

Starwind Software Developer

www.starwindsoftware.com

Kind Regards, Anatoly Vilchinsky
alapierre
Contributor
Contributor
Jump to solution

Ok, thanks. I'll try some of those things. Is there a best practice for testing data throughput between a SAN and Host?

Reply
0 Kudos
AnatolyVilchins
Jump to solution

http://www.iometer.org/

Starwind Software Developer

www.starwindsoftware.com

Kind Regards, Anatoly Vilchinsky
Reply
0 Kudos
J1mbo
Virtuoso
Virtuoso
Jump to solution

Another vote for IOMeter.

Try test 32K 100% sequential read (and then write) with 64 outstanding IOs, this will give sequential performance. Should be near to 100MB/s per active GigE path, depending on how much the storage system can push out.

Then 32K 0% sequential read (and then write) with 64 outstanding IOs against a good sized test LUN (say 4GB+) will give a value for IOPS, which is the main driving factor for virtualisation. Look at the latency, this needs to remain below about 50ms usually so you can work out whether the default 32 outstanding IOs (per host) is OK (say you had six hosts, the array would need to be able to deliver random IO with latency <50ms with 192 outstanding IOs (=32*6)).

Don't use the 'test connect rate' and this effectively tests only cached throughput, which we're not so interested in anyway.

Please award points to any useful answer.

Reply
0 Kudos
TobiasKracht
Expert
Expert
Jump to solution


Multipathing may be causing your issue. Are
you able, and have you tried, to disable multipathing and just have one
1Gb connection to your SAN? VMware may be path thrashing when put under
load because of a bad link, or a delay in packet delivery...


BTW, your maximum throughput with a 1Gb link is going to be
~30MBytes/sec if your SAN and ESXi host were the only two devices on
that link...

StarWind Software R&D

StarWind Software R&D http://www.starwindsoftware.com
Reply
0 Kudos
alapierre
Contributor
Contributor
Jump to solution

Ok, over a single connection I'm getting 117 MB/s read through iometer. I enabled Multi-pathing and now I'm getting

with 64 outstanding IOs

135 MB/s 100% Sequential Read - 17ms response time

132 MB/s 100% Sequential Write - 15ms response

26 MB/s 100% Random Read - 76ms response

15 MB/s 100% Random Writes - 140ms response

Are the random numbers ok?

I believe I was mistaken with my initial belief. The operation I was performing still isn't using more than 10MB/s, it is 100% read, probably mostly random. Is there any setting that throttles a VMs Disk access speed?

Off-topic question, should I have the swapfile located on the SAN with the VMs, or on a local 7200 rpm datastore?

Thanks for all your help, I'm getting a much better picture of what's going on.

Reply
0 Kudos
J1mbo
Virtuoso
Virtuoso
Jump to solution

Glad things are progressing. Your results show that the array cannot handle that level of outstanding IOs with reasonable latency - I would re-run the random IO tests, reducing the outstanding IOs until the write latency is about 50ms to find this (I guess about 22).

Then consider how many hosts will be accessing each LUN. If you have two hosts potentially accessing this LUN, you would probably want to limit each of their queues to say 11 or 12 via disk.schednumreqoutstanding in advanced settings in ESX.

HTH

Reply
0 Kudos
J1mbo
Virtuoso
Virtuoso
Jump to solution

To add (sorry!) from the numbers I guess this is a RAID-5 array? Assuming this is all in test at the moment, if you can afford the space difference it may be worth reconfiguring the iSCSI device as a RAID-10 and re-running the tests, the difference can be dramatic, especially in writes. As a point, usually random tests are measured with the IOPS number rather than MB/s.

Re swap files, there are several considerations. Perhaps most importantly, that if vSwap is on local storage, the VM cannot be vMotion'd or HA'd. Also vSwap is really a last resort for an ESX box short of physical RAM, I would make mighty sure it is never normally used since the massive disk IO will absolutely destroy performance of everything running. This is probably why it's always recommended to allocate only the minimum RAM to a VM that is needed etc. Also make sure vmware Tools are installed, since the balloon driver provides ESX with a much more gentle mechanism to get RAM back from VMs (using the running OS's native paging capabilities).

Again, HTH Smiley Happy

Reply
0 Kudos
alapierre
Contributor
Contributor
Jump to solution

You're good Jimbo....22 IOs brought it right to 50ms. It is Raid 5, and I unfortunately can't sacrifice the space. Perhaps next year I'll get new drives and Raid 10 it. It will be one host accessing it at a time, with about 6 VMs on it. I have another host that will be able to access it, but in most cases it won't unless the main server goes down for maintenance. Do you think this performance will be ok for 5 or 6 VMs on a single 350GB LUN? The VMs will be a Web Server, Accounting Server, File Server, and a couple very low resource hungry servers. In hind site, this may not have been a good plan, but the File Server will have it's files located on the same SAN, but a different 1.5TB LUN. I'm stuck doing it this way for now, so hopefully it's not horribly slow.

Reply
0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

Nothing in your list looks to be overly IO intensive. If you don't mention databases or Exchange servers, it's a stretch to suggest RAID5 will perform too poorly for a given scenario. It's likely that if there's an issue, it's somewhere else.

What sort of switch are you using? Are your server's NICs dedicated to iSCSI and are they on a separated VLAN/physical network to your data LAN?

Edit: You mentioned using the same switch so they obviously aren't physically separated. If you haven't separated the VLANs, I would suggest looking there.

Reply
0 Kudos
alapierre
Contributor
Contributor
Jump to solution

We use several MySQL databases for Web Programs, but only the main website database is heavily trafficed...and I use the word 'heavily' loosley.

I'm using an HP Procurve 1400 24 port GigE switch, unmanaged. I have two NICs dedicated to the iSCSI, both on separate subnets plugged into the same switch, and two NICs on the iSCSI with subnets to match each of those. The only thing between the Host and the iSCSI is that one switch, and the only thing on those two subnets are those 4 NICs. However, that switch also passes packets for my data lan which is on a different subnet all together. I could get a separate 8 port GigE switch just for the Host and SAN if you think the 24 port switch may slow things down since it's used for my data lan as well. There are only 7 or 8 total things plugged into that switch, most of our data traffic happens in the building across the way. VLANs are something I don't fully understand, is there a good write-up on them somewhere? Thanks for your suggestions

Reply
0 Kudos
lyhigh
Contributor
Contributor
Jump to solution

yeah i agree. any other new alternative solutions out there?

Reply
0 Kudos
lyhigh
Contributor
Contributor
Jump to solution

i am looking for new stuffs out there that improve this isci san thing. any ideas?

Reply
0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

What you are describing is a network where both data and iSCSI flows over one network.

I can't say this will surely be your problem, but it's definitely not a best practise and something you should look to fix. By placing the storage network on a separate VLAN, you effectively break up your switch into two smaller ones, fixing this issue. However, as you have said it is unmanaged, this will not be an option.

Utilising a separate switch would be useful, but also be careful here. If a smaller switch happens to be a cheaper switch, you're running into its capabilities. Really you do need something with flow control, jumbo frames and fast backplanes to run iSCSI with performance.

Reply
0 Kudos
alapierre
Contributor
Contributor
Jump to solution

Ok, I'll look into getting a decent separate switch. I read that ESXi doesn't support Jumbo frames, so I disabled them on my SAN. I would love to be able to use them though if I'm mistaken.

Reply
0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

yeah i agree. any other new alternative solutions out there?

What exactly are you agreeing with? That the original poster has a performance issue?

Edit: An alternative to iSCSI?

Sure, Fibre and 10GbE iSCSI. The former having been around for a while.

Message was edited by: Josh26

Reply
0 Kudos
Josh26
Virtuoso
Virtuoso
Jump to solution

Ok, I'll look into getting a decent separate switch. I read that ESXi doesn't support Jumbo frames, so I disabled them on my SAN. I would love to be able to use them though if I'm mistaken.

Refer here:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101245...

ESXi free does not support Jumbo Frames - although you can actually enable them during your initial evaluation period and have them stay enabled.

That said, with the network you describe, you would benefit from a purchase version of ESXi, and any edition will provide this support. Jumbo Frames make a world of difference to performacne with iSCSI.

Reply
0 Kudos
alapierre
Contributor
Contributor
Jump to solution

Thanks for your help Josh. I'll look into the pricing on it.

Would something like the Netgear JGS516 GigE w/ 2mb Buffer, Jumbo Frames, Flow Control, unmanaged be good enough? It's about $170 on Newegg.

Reply
0 Kudos