VMware Cloud Community
0v3rc10ck3d
Enthusiast
Enthusiast

Updating Equallogic Firmware

Hello,

I've contacted Equallogic a few times and I cant seem to get a straight answer from them regarding this and I want to be 100% sure before undergoing this proceedure.

We currently have two PS4100VX arrays in a single Group, the volumes are balanced across these two arrays.

Both arrays are running firmware version 5.2.2, and the total usage of the Group is about 70% capacity.

Due to the capacity I am unable to evacuate the volumes to a single array for for the firmware updates.

We are running a few hundred VM's currently so the option of pausing them all then unpausing isnt really there. I want to avoid crashing all the VMs obviously.

A few people at EqualLogic said it should be fine updating the firmware on an array and it will update the firmware one controller at a time and then switch the passive controller to active though there could be up to a 30 second delay.

If that's the case how are the Hosts / VM's going to react to this? Also will there be issues with my volumes being stretched across both arrays when I suddently lose connectivity to half of each volume?

I'm looking to update to 6.0.2 to benefit from the VAAI iSCSI UNMAP feature as well as simply keeping it up to date.

I'm running the Equallogic MEM Driver 1.1.2 on all datastores on all hosts.

Any help, advice or direction is much appreciated. Thanks!

VCIX6 - NV | VCAP5 - DCA / DCD / CID | vExpert 2014,2015,2016 | http://www.vcrumbs.com - My Virtualization Blog!
0 Kudos
5 Replies
Ethan44
Enthusiast
Enthusiast

Hi

Welcome to the communities.

Before upgrading the firmware we must know the firmware the root cause  .

As per my experiences if we call any of product tech support first they will guide updates os , firmware , BIOS etc etc ,

My experiences was very bad in this concern because after up gradtion I was in same place .

So please find the root cause before upgrading the firmware.

"a journey of a thousand miles starts  with a single step."
0 Kudos
dwilliam62
Enthusiast
Enthusiast

Hello,

There is a PDF that's with the Firmware documentation on the Equallogic Support side, called "OS considerations guide".    This has info about setting the disk timeout value for a number of OS's.   The disk timeout value needs to be set on all your VMs and physical hosts. 

In ESX the Login timeout needs to be adjusted to 60 seconds. 

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=200782...

The firmware upgrade process by default updates the flash on both controllers, then restarts the secondary to bring it up to the new level.  Once that completes and re-sync's with the primary, the primary is restarted to force a failover to the upgraded secondary.   Depending on model number and load on the member, this typically takes 20 to 60 seconds.   With the timeouts set, servers will ride out this period without a problem.   Doing upgrades in a low IO period is also suggested.

The other important consideration for making these changes is being able to handle a controller failure.

Regards,

Don


0 Kudos
dwilliam62
Enthusiast
Enthusiast

Sorry I missed you were running MEM 1.1.2.  That sets the login timeout automatically for ESX.   The DiskTimeOut values still need to be set in the VMs though.

Also, 6.0.4 will be released soon.  I would wait to upgrade to that over 6.0.2.

Other things to do is disable DelayedACK and Large Recieve Offload in ESX

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100259...

Solution Title HOWTO: Disable Large Receive Offload (LRO) in ESX v4/v5

Solution Details Within VMware, the following command will query the current LRO value.

# esxcfg-advcfg -g /Net/TcpipDefLROEnabled

To set the LRO value to zero (disabled):

# esxcfg-advcfg -s 0 /Net/TcpipDefLROEnabled

NOTE: a server reboot is required.

Lastly, if you have mulitple VMDKs (or RDMs) then create additional Virtual SCSI controllers in each VM, (up to 4 max per VM), to increase performance. Especially true with applications like SQL, Exchange, Sharepoint, etc...

For your VMs that have more active I/O you can also try the "Paravirtual SCSI adapter" on your data VMDKs (or RDMs)

http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.vmadmin.doc_41/vsp_...

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101039...

Regards,

Don

0v3rc10ck3d
Enthusiast
Enthusiast

Thank you for the info, So with the MEM 1.1.2 the hosts are already set for the correct timeout?

Is it required to configure the DiskTimeOut values in the virtual machines? We have a quite a few in a multi-tenant hosted environment and will be adding vCloud Director soon to expand upon this. This would be a huge task and I'm not even sure if I have access to all the VMs.

Is there a way to disable DelayedACK and LRO globally? We have quite a few hosts in multiple clusters. I have read in some places that this is reccomended though.

I read through some of your posts and you really seem to know your stuff with this, much appreciated.

I also have a couple other questions, currently I have a ticket in with Dell and they are trying to find it out but I'll throw the info out here too.

We're noticing a few errors that keep poping up saying

"Path redundancy to storage device naa.etcetcetcetc degraded, Path vmhba35:C3:T11:L0 is down. Affected datastores: etcetc"

Then i'll receive an message about 15 second later saying it's active again, this seems to only be happing with a single datastore and no others, even though we have multiple datastores from the same arrays connected to the same host spitting out this error.

In addtion to this we recently started getting errors that say something similar to

"Device naa.etcetcetcetc performance has deteriorated. I/O latency increased from average value of 5888 microseconds to 120037 microseconds."

I figured this had something to do with Veeam causing some extra datastore usage or the fact that I recently enabled storage i/o control on all Equallogic LUNs and it had some learning to do to find out what the average should be.

Thanks again

VCIX6 - NV | VCAP5 - DCA / DCD / CID | vExpert 2014,2015,2016 | http://www.vcrumbs.com - My Virtualization Blog!
0 Kudos
dwilliam62
Enthusiast
Enthusiast

You are most welcome,

You have to set Delayed ACK and LRO on a per node basis.  A reboot is required as well.

Re: DiskTimeOut. Yes, every running VM needs to have this set and in the case of Windows it requires a reboot of the VMs.  Check some of your Windows VMs with VMware tools installed the DiskTimeOut value may already be set.  It won't set the timeout for Linux, or Solaris, etc..  Just Windows.

Re: Path Redundancy.  Any time the array moves or MEM moves a connection to a different port,  ESX will generate that message.

Here's a VMware KB about it.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100955...

Re: latency errors.  When you have gotten into best practices, that I outlined before.  DelayedACK, LRO, Multiple Virtual SCSI adatpers in the VMs, those errors will almost certainly stop.  Also, make sure you are runnign the most current build of ESXi.   Updated kernel and network drivers also tend to resolve those messages, when the other things don't completely eliminate them.

If you haven't done so yet, please download SANHQ from the Equallogic website.  This will provide a wealth of info for yourself and tech support when dealing with performance issues.

Regards,

Don

0 Kudos