VMware Cloud Community
GBTurpin
Enthusiast
Enthusiast

Horrible FC SAN LUN Performance ESX 3.0.2 & EMC CX500

I've got two identical Dell PE1850 servers, once running RedHat Enterprise Linux (RHEL) 5 natively and the other running ESX 3.0.2 with RHEL5 as a Guest VM.

Both are connected to a fiber channel SAN with two HBAs w/ multipathing enabled.

PERFORMANCE IS HORRIBLE in the ESX RHEL5 Guest.

I see about 4.6MB/sec when copying up to the VM RedHat 5 box.

I see about 57MB/sec when copying files down from the VM RedHat box.

I got my statistics using these two commands:

time dd if=/dev/zero of=/backup/testfile bs=16k count=16384

time dd if=/backup/testfile of=/dev/null bs=16k

When I share a LUN out to the non-VMware RHEL 5 server, and I run the same tests, I get amazing performance:

260MB/sec upload to SAN, and 2GB/sec down.

This is a HUGE difference.

Can anyone tell me why sending data to the VM RHEL 5 file server is so slow?

The LUNs for VMware are setup:

SAN LUN -> ESX Host -> VMFS3 -> VMDK

One is 1TB and the another 1.9TB, both at 8MB blocks and formatted to VMFS3. The second has a huge VMDK of 1.9TB.

Reply
0 Kudos
4 Replies
GBTurpin
Enthusiast
Enthusiast

Oh, and for completeness (sp) sake, let me indicate all of the other things I've tried as well:

I've installed NETPERF on the ESX RHEL5 VM and a Fedora 7 box on the same network. That shows that the 1000BaseT performance on the NFS clients is excellent and not a problem.

VMware has checked the logs on the ESX box, and the storage sub-systems don't ahve any errors.

I've also spoken with some of the folks that use MANY CX500s at a VERY large company, and they've indicated that they are seeing the same types of problem between VMware ESX and the CX500 EMC SAN.

It appears that when they contacted VMware that also long as any communication seemed to be happening between the SAN and the ESX Host, VMware really doesn't seem to care about solving the issue.

They have EMC coming out the take a look at things and make a determination as to where the problem lays.

If anyone has a similar experience, and especially a solution, it would be good to know.

(VMware Platnium Tech Support doesn't seem eager to help me solve this, they've been sitting on it 5 days... and that's with me calling them and asking when a solution will be forthcoming.)

Reply
0 Kudos
Erik_Zandboer
Expert
Expert

Hi,

You probably thought of this, but I hav not seen it in the details... Did you install VMware tools on the VM?

Visit my blog at http://www.vmdamentals.com
Reply
0 Kudos
GBTurpin
Enthusiast
Enthusiast

Yep, running ESX 3.0.2 Update 1, and installed VMware Tools for Linux. All of the drivers are running and working.

To add to this:

This AM I powered down the real RHEL 5 1850 and attached the LUN to the VMware storage group in the CX500 in NaviSphere. I then told ESX to map that LUN as a RAW Device Mapping to the VM RHEL5 Guest.

The performance is BAD, but there is a twist!

The first run yeilds the following results:

# time dd if=/dev/zero of=/rawdisk/testfile bs=16k count=16384

16384+0 records in

16384+0 records out

268435456 bytes (268 MB) copied, 5.70876 seconds, 47.0 MB/s

real 0m6.038s

user 0m0.011s

sys 0m5.719s

# time dd if=/dev/zero of=/home/rawdisk bs=16k count=262144

262144+0 records in

262144+0 records out

4294967296 bytes (4.3 GB) copied, 233.818 seconds, 18.4 MB/s

real 3m53.880s

user 0m0.782s

sys 2m58.990s

#

The second run, which shows the Linux server with almost all of it RAM now being used, shows:

# time dd if=/dev/zero of=/rawdisk/testfile bs=16k count=16384

16384+0 records in

16384+0 records out

268435456 bytes (268 MB) copied, 10.5712 seconds, 25.4 MB/s

real 0m10.607s

user 0m0.068s

sys 0m10.384s

# time dd if=/dev/zero of=/home/rawdisk bs=16k count=262144

262144+0 records in

262144+0 records out

4294967296 bytes (4.3 GB) copied, 208.846 seconds, 20.6 MB/s

real 3m30.680s

user 0m1.113s

sys 2m36.474s

Now, I read the 286MB file, I get OK read speeds. (It's still WAY slower than the HW based system, which is 2GB/sec.)

# time dd if=/rawdisk/testfile of=/dev/zero bs=16k

16384+0 records in

16384+0 records out

268435456 bytes (268 MB) copied, 0.84043 seconds, 319 MB/s

real 0m0.852s

user 0m0.015s

sys 0m0.833s

Now, I'm noticing the my RHEL5 Server VM Guest is running SLOWLY after all of the file transfers. So the performance of the VM is going down after massive file transfers...

In Closing:

So, it appears that a RAW DEVICE MAPPING is a LITTLE better, but still quite bad.

Also, using a REAL HW based system with RHEL5 is MASSIVELY faster. This is not typically a problem I see with VMware.

Any SAN gods out there have any peformance tips I should check for the HBAs and such under VMware?

Gerhard

Reply
0 Kudos
GBTurpin
Enthusiast
Enthusiast

So,

A little egg on my face, but the support issue with VMware is still just as much of an issue.

Basically, the case got elevated to a senior level engineer. The deal is that after 5 days of VMware troubleshooting things and having a serious lag between call backs, I noticed something:

In NaviSphere there is a setting to enable "Write Caching," which some how got turned off. I suspect it may have happened during the sceduled maintainance by EMC, but I'm really not sure.

Under Windows based VM's the performance is good. (approximtely 60-70MB/sec in Read speed, or at least I'm told that's good...)

A physical Windows 2003 server with an RDM gets around 100MB/sec, so there is certainly a difference.

The biggest concern for me deals with NFS clients and their preformance, which when I ran a test of a server on ESX (as a Guest) and a client writing to a LUN , I got 20MB/sec. That's a big difference in speed.

So clearly I need to test a few more things. Overall, write caching was the big problem...

The VMware guy did notice the setting issue, but by then I'd figured it out.

Testing with RHEL5 still needs to be done, and I'm not certain that it'll work as well. I'll post the results.

Interestingly the senior tech did indicate that using all zero's wasn't a good test, but I honestly can't recall the reason.

Lastly, when I final ran the last Windows test and put it on a LUN formatted to VMFS3 with a 245GB VMDK, I was getting 700-800MB/sec - which is physically impossible over a 2Gbit FC link. So VMware ESX was doing something odd... though it didn't really appear to be an issue.

Gerhard

Reply
0 Kudos