VMware Cloud Community
lucheman
Enthusiast
Enthusiast
Jump to solution

Poor SAN Performance???

We have just set up 3 new HP DL380 G5's with ESX 3.5 and all the latest patches (build 82663). We are also running VC 2.5. Each ESX box has 1 dual port Qlogic QLA2432 hba installed. Only one port on these hba's is connected to the SAN. The array is an EMC NS20, and we are stricly using the FC part of the NS20, not the celerra part...

The strange thing is that there is no FC switches between the ESX servers and the NS20, all 3 servers are directly connected to different storage processors on the NS20.

The problem is that if I am working with a vm on the local drives of these ESX servers, all is fine. But if I am working on or building a vm on the LUNS on the NS20, it is EXTREMELY SLOW. Probably 25 or 30 times as slow as the local disks. For example, if I am installing windows on a vm that is stored on the local disks on an ESX server, the file copy where it copies the files from the cd to the hard drive at the beginning of the installation of windows takes about 3 minutes. If I am installing windows on a vm that is stored on any of my 4 300GB LUNS on the NS20, the file copy part takes around 1.5 to 2 hours!

Has anybody seen this type of behavior before in a similar config?

Any suggestions for fixing it?

Thanks in advance guys,

Jon

Tags (1)
Reply
0 Kudos
1 Solution

Accepted Solutions
mcowger
Immortal
Immortal
Jump to solution

I guess what I'm saying is that it wont help, because you will still end up with path thrashing across the 3 SPs. There are 2 solutions to your problem:

1) Get some FC switches.

2) Drop one of the SPs, leaving you with 2, and connect each ESX server to each SP. Setup your pathing appropriately so you are using MRU pathing with identical preffered paths on each host (e.g. LUN A preffered down SPA, LUN B down SP B). This assumes your SP's have enough ports to accomdate 3 FC connections each. This will alleviate the performance problems, but certainly isn't scaleable.

--Matt

--Matt VCDX #52 blog.cowger.us

View solution in original post

Reply
0 Kudos
19 Replies
Paul_Lalonde
Commander
Commander
Jump to solution

Hey Jon,

Write-caching enabled on the SAN? Bad cache battery?

Any unusual errors or warnings in /var/log/vmkwarning or /var/log/vmkernel ?

Paul

Reply
0 Kudos
mike_laspina
Champion
Champion
Jump to solution

Hi,

Check the /var/log/vmkernel for path failover events.

It sounds like the SAN is thrashing due to a incorrect path configuration

post the output of

esxcfg-mpath -l

http://blog.laspina.ca/ vExpert 2009
Reply
0 Kudos
mcowger
Immortal
Immortal
Jump to solution

methinks mike is right and you are thrashing the crap out of the paths - the NS20 isn't an active/active device.

--Matt

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
mike_laspina
Champion
Champion
Jump to solution

Hi Matt,

I guess we should mention to make sure the failover policy is MRU and not fixed.

Smiley Happy

http://blog.laspina.ca/ vExpert 2009
Reply
0 Kudos
mcowger
Immortal
Immortal
Jump to solution

Not sure that will help him here. From his description, it sounds like he's attached each host directly to one and only one SP (no intervening switching fabric), hence each machine has only 1 path. However, each host is accessing the same LUN, so the path is being forcibly moved. Given that it soundslike each host only seeing 1 path, changing his pathing policy wont help.

The OP would need to build a proper switched architecture, or start using the Celerra features of his NS20 and go over NFS.

Of course, this depends on my assumption being right Smiley Happy

--Matt

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
lucheman
Enthusiast
Enthusiast
Jump to solution

I am trying to determine if the NS20FC is set to Active/Active or Active/Passive. Are you saying that there is no way it could be active/active because the only option for an NS20 is active/passive? Do you know how i can tell from the emc side what it is set to?

Reply
0 Kudos
mike_laspina
Champion
Champion
Jump to solution

Also the NS20 has 4 FC ports per card 2 are Tape (I'm not sure you can use the tape FC port for disk, some units allow it) and 2 Disk so the configuration will likely need a low end switch to continue with FC based connectivity.

http://blog.laspina.ca/ vExpert 2009
Reply
0 Kudos
mcowger
Immortal
Immortal
Jump to solution

The NS20 is not an Active/Active device in any configuration - its simply not capable of it - everytime you switch the path through which a LUN is being accessed it switches the ownership of that LUN to the new path. That incurs a signficant penalty (could be on the order of seconds) on the disk access that causes the path switchover.

Your connection methodology of directly connecting each system to a separate SP is causing the poor performance you are seeing. You need at least 1 (preferably 2) fibrechannel switches.

--Matt

--Matt VCDX #52 blog.cowger.us
lucheman
Enthusiast
Enthusiast
Jump to solution

Thank you. That helps me confirm what we believe is happening here. Each server has only one path directly to the NS20 storage processor ports. However, these ESX servers all have access to all four of my LUNS (cluster). So what I think is happening is that as one server accesses LUN1 through its path from sp a, and then another server accesses LUN1 through its path from sp b, the ownership of the LUN is transferred and a trespass occurs. I am looking in Navisphere Mgr and on these four LUNS that have been setup for only 2 days, we see the # of trespasses is over 6,000 already and steadily increasing as we watch it. On one LUN it is over 81,000 trespasses I believe that is the problem, would you agree?

We are first going to try cabling up the other ports on these hba's to see if ESX multipathing will help resolve this issue.

Reply
0 Kudos
mcowger
Immortal
Immortal
Jump to solution

Yes, your # of trespasses confirms this is a path thrashing issue. No question.

if you have 3 SP (implied by your statement that each of 3 servers is connected to its own SP), theres no way to connect your hosts such that you willl not end up with trespasses. You need a proper swithc setup.

--Matt

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
lucheman
Enthusiast
Enthusiast
Jump to solution

can I just connect the servers with a second fiber path? esx should

then be able to handle the multipathing correctly right?

Reply
0 Kudos
mcowger
Immortal
Immortal
Jump to solution

How are you going to connect a servers to 3 paths (1 per SP) with only 2 fiber ports on each server?

--Matt

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
lucheman
Enthusiast
Enthusiast
Jump to solution

right now each server only has 1 path. I am just talking about adding

a 2nd path...

Reply
0 Kudos
Rumple
Virtuoso
Virtuoso
Jump to solution

If you've invested this much money into an NS20 as well as 3 ESX FC connected hosts then you should be completing this implementation and installing 2 FC Switches. Cross connect your 2 SP's on the NS20 to the Fiber switches and Connect each ESX server to the 2 Switches and complete all zoning.

What you've effectively done at this point is bought yourself a nice LCD HD Screen, Full 7.1 Audio system and are running Analog TV Signal and wondering why the picture sucks and you are only getting 2 channel sound.

Until all the pieces are completed, you are going to be sorely disappointed in the results, as well as give everyone the impression that vmware/ns20 is not a go forward technology, when infact its the implmentation thats been done poorly.

Reply
0 Kudos
mcowger
Immortal
Immortal
Jump to solution

You have 3 SPs. You have 3 Servers, each with 2 ports. How are 2 ports going to connect to 3 SPs (which you have to do to avoid path thrashing)?

Until you fix this properly with switches, you are going to continue to path thrash.

--Matt

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
lucheman
Enthusiast
Enthusiast
Jump to solution

I am a consultant and am simply working with the tools I have been given. I

have obviously made the recommendation for the customer to buy a couple FC

switches, but that is up to the customer to decide. In the meantime, I am

just trying to determine whether or not we can get the trespassing and

performance degradation fixed by connecting 2 paths from each ESX host and

letting ESX handle the multipathing. If you know whether that will work or

not please advise, if not, please go back to doing whatever you were doing

before you wrote your post.

On Thu, May 1, 2008 at 12:23 PM, Rumple <communities-emailer@vmware.com

Reply
0 Kudos
Rumple
Virtuoso
Virtuoso
Jump to solution

If you want to get snarky, then as the consultant you should know if its going to work before you go in there screwing around with their environment and making a mess of it on their dime.

Your job is to go in there and make a recommendation on how they should implement this particular technology and if they are unwilling to pay the money or want a half asses solution, then as a ethical consultant your job is not to implement a half assed solution and bill them excessive hours to implement, but to stick by your recommendation and your reputation and walk out the door.

When the project falls on its face they can then come back begging to have it done correctly, or they will continue to try and make it work and then its not your headache or reputation getting smeared.

But then again, your reputation is your problem not the customers so they don't care.

Reply
0 Kudos
mcowger
Immortal
Immortal
Jump to solution

I guess what I'm saying is that it wont help, because you will still end up with path thrashing across the 3 SPs. There are 2 solutions to your problem:

1) Get some FC switches.

2) Drop one of the SPs, leaving you with 2, and connect each ESX server to each SP. Setup your pathing appropriately so you are using MRU pathing with identical preffered paths on each host (e.g. LUN A preffered down SPA, LUN B down SP B). This assumes your SP's have enough ports to accomdate 3 FC connections each. This will alleviate the performance problems, but certainly isn't scaleable.

--Matt

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos
lucheman
Enthusiast
Enthusiast
Jump to solution

your 2nd option here is what i was trying to say and i just set it up and it

is working well... thank you. we have two of the servers dual connected

right now (short on FC cables) and using the MRU failover policy. the

trespassing has stopped and performance is 100 times better. they do have

the ports on the NS side to cable up the 3rd server when the cables get here

tomorrow. i have made the recommendation to the customer to buy the FC

switches, but at least in the meantime their environment will be able to run

- but not scale. thanks for your help so much.

On Thu, May 1, 2008 at 2:28 PM, mcowger <communities-emailer@vmware.com

Reply
0 Kudos