We recently defined an additional LUN to be shared between two ESX hosts. Things didn't go as planned. Some existing SAN volumes changed LUN numbers and had different LUN numbers on each path (I forgot to mention that each host is connected to the storage via two fabrics). The end result was that the number of volumes increased because ESX did not realise it was dealing with the same LUN with a different number on each path. We ended up with corruption on the volume and had to scratch the files.
Without getting into excessive finger pointing there is some debate about the cause of the issue. The SAN team suggest that as ESX claims to support path fail-over it should be able to cope, and that anyway Windows (running Veritas Volume Manager) never has a problem because it writes a signature to the disk.
You won't be surprised if I tell you that the ESX team do not entirely agree. Their view is that if the LUN numbers of the volume change then all bets are off.
From discussions with our experts, and those of the storage vendor, here is a set of guidelines that I am working on:
1. Never change the LUN number of a volume. Use LUN number mapping on the SAN storage to prevent LUNs numbers changing or being reordered when a volume is added or deleted.
2. Add space to ESX by adding a SAN volume and extending the vmfs file system across the new volume using ESX functionality.
Steps to add storage /u
1. Check that dual-paths are healthy on all ESX nodes.
2. Shutdown one path between the storage and the ESX hosts.
3. Make the change to the storage. (See point 1).
4. Verify all volumes are available on all hosts.
5. Double-check the SAN configuration and bring up the second path.
6. Verify all volumes are available.
The business about going to a single path comes from the storage vendor.
Does this sound sensible? Anyone got any better ideas?