VMware Cloud Community
SurfControl
Enthusiast
Enthusiast

LUNs trespass when rebooting an ESX server

we have the following setup:

10 ESX 4.0 u2 Server / path policy is VMW_SATP_ALUA_CX Round Robin/

EMC CLARiiON CX4-480/Flare 28.5.707 / Failover Mode 4 for esx hosts/

(the above setup is supported according to emc doc h1416)

rebooting an esx server will trigger the LUNs in the storage group to trespass.

any idea why this is happening?

thanks,

0 Kudos
2 Replies
binoche
VMware Employee
VMware Employee

look like ESX issue, could you please post vmkernel messages? thanks

binoche, VCP, CCNA

0 Kudos
SurfControl
Enthusiast
Enthusiast

Nollaig from emc clarion forum points out that there's an emc kb article which describes exactly what’s going on with the issue:

================================================================================================

ID: emc232355

Usage: 7

Date Created: 01/26/2010

Last Modified: 03/12/2010

STATUS: Approved

Audience: Customer

Knowledgebase Solution

Question: Using the VMware ESX native multipath plug-in (NMP) with a CLARiiON array.

Environment: Product: CLARiiON CX4 series

Environment: Product: VMware ESX Server

Environment: Product: VMware ESXi Server

Problem: LUNs are trespassing when ESX server is rebooted

Problem: LUNs are trespassing when ESXi server is rebooted

Problem: LUNs are trespassing when VMware server is rebooted

Problem: A reboot of one ESX or ESXi server might trespass some or all LUNs to SP A or to SP B. This only happens when the CLARiiON array is in ALUA mode (failovermode = 4) and the path selection plugin (PSP) is RR or MRU.

Change: A user either sets the policy for each device from the GUI or runs the esxcli nmp device setpolicy command.

Root Cause: The default path selection plugin (PSP) with native multipath plug-in (NMP) for the CLARiiON array is set to FIXED when using failovermode = 4.

The esxcli nmp device setpolicy command sets configuration for a particular device. During the boot, in order to apply the configuration, VMkernel has to create the device with the default configuration specified by the claim-rules (for example, PSP_FIXED) first. Only after that can the VMkernel apply the device-specific configuration (PSP_RR) to the device. In this particular case having PSP_FIXED claim the device for a brief period of time is enough to cause the unwanted path failovers.

Fix: Use the following command to change the default PSP for VMW_SATP_ALUA_CX to VMW_PSP_RR instead of VMW_PSP_FIXED:

esxcli nmp satp setdefaultpsp --satp=VMW_SATP_ALUA_CX --psp=VMW_PSP_RR

If you want some of your devices to be managed by FIXED or MRU path selection policies, you can configure these devices separately using the vSphere GUI or the esxcli nmp device setpolicy command.

In this case all devices for CLARiiON in ALUA mode will first be claimed by the VMW_PSP_RR, and then some of them will be switched to VMW_PSP_FIXED or VMW_PSP_MRU.

This is safe because VMW_PSP_RR does not do failovers unless no "Active" paths are available.

0 Kudos