Currently considering EQL PS series for SAN replacement for VMFS amongst others. The big off-putter is the reboot of array required to finalise fw updates, and the impact that would have for mutiple datastores disappearing.
I found a doc which refers to the EQL Multipath Extension module which allows setting the iSCSI timeout value to 60s. Or a patch ESXi-5.0.0-20111204001-standard which also allows it to be edited. I also found a youtube video showing a firmware update/array reboot and pings to/from a vm on that array didn't drop. So the machine didn't crash but it would be nice to know if ESX/vCenter panicked like mad.
Question is, will my Hosts/VMs be happy with the reboot of the array, if a datastore is stretched across several members of an array? I would assume I/O on the VMs would hang, and then come back to life when the reboot has completed?
Is this only possible in vsphere5?
We're currently still on 4.0 (but would move to 4.1 if not 5 to take advantage of the VAAI/EQL integration).
Can anyone confirm that they've updated the firmware on an EQL underpinning a live production vSphere environment with no failure of VMs?
Hi. We've been using EqualLogic for over three years now and the recommended process is to have a ‘Maintenance Pool’ and vacate the tray you want to update. I do have a site with a single Array, but only has basic Windows Application servers, no SQL or Exchange etc, and I have upgraded the firmware without any problem. The is A KB article about increasing your iSCSI time out to help with this. On our main Production Storage I vacate a shelf at a time and update from the CLI. You do need to have sufficient capacity to evacuate your largest array on to the remaining Storage. All your volumes are then always available and the tray you want to update has no live volumes connected. It will take a couple of weeks if you have a lot of data (11Tb takes about 30 hours on our network) and multiple Arrays, but evacuating one at a time safeguards the Data and the server connections.
In a simple test I did that, moving stuff to a seperate array using a maintenance pool.
A proposed new storage solution as designed by Dell is to have on our Primary site:
One Group, essentially comprising:
Tier 1 Member
Tier 2 Member
Tier 3 Member
All 3 members would be in 1 pool. Therefore any volume (ie. VMFS Datastore, Windows Volume) could (would) be balanced over all 3 members.
So as each member array would need to be rebooted, a portion of the volume would be unavailable.
Dell tell me that people do it, that once your iSCSI timeout value has been adjusted, there's nothing to worry about, people do it live.
I am yet to be convinced...
EqualLogic, like all enterprise SAN/NAS arrays, has two controllers. This comes into play during a firmware upgrade when the active controller is restarted, the passive/secondary controller becomes the active controller. The initiator is not waiting for the controller reboot, rather, it is only unable to access the storage briefly while it does a iSCSI Login.
Hope this clarifies things,
(Dell EqualLogic Technical Marketing)
Despite what Technical Marketing tell you, DO NOT mix your Tier 3 tray with the other two if they are SATA disks and keep them in seperate Pools. I originally had a Raid 10 15K SAS tray, a Raid 50 10K SAS tray and a Raid 6 7.2K SATA tray in seperate Pools and this worked just fine. I still had a maintenance Pool that I used for maintenance, but I reached a point where I had insufficient space to move my volumes and vacate the tray I was going to upgrade. Technical Sales recommended I put all my trays into one Pool.
This worked fine until the Jubilee weekend. When I arrived back on the Wednesday I had lost sight of one of my Volumes and had about 8000 iSCSI disconnect errors. This was due to the iSCSI connections exceeding the 1024 Pool connection limit, but Technical Support said I should not have mixed the drive types in the same Pool. Technically it does work and had done for about six months, but it pulls down the performance of the fast disks and the controllers have much more work to do keeping it all together. The result was that I had to strip out all the Raid 6 Volumes on the SATA tray. This also had the benefit of spreading the iSCSI connections across two Pools. The gotcha was, that the number of connections reported at the CLI only shows active connections, but I had MPIO configured and the actual number of connections was double what was reported.
In summary, yes you can do it live, there is a disconnect during the controller hand over and iSCSI time out settings should accommodate this. My preference is while I have space is to vacate the tray to a Maintenance Pool and upgrade it empty. I would keep your Tier 3 in a separate Pool. As you grow, keep an eye on your Pool connections.
Thanks for the experiance Gsteen.
Maintenence drainging takes FOREVER, but the alternative is a nightmare senario if things go wrong.
Once on my eq boxes 2 disks failed after a firmware update! Supprot said that should never happen... but it did! after the box came up from the restart it said
I did a drain also so it wasnt a big deal but if I had done it while live data was on, then that would not have been a fun time!