I'm pretty stumped on this one so I'm hoping someone can help... I have an ESXi 3.5 server (Dell R900) with two VM's running on it, both server 2003 - one running Exchange 2003. The R900 is connected to an Equallogic PS6000E sata SAN using the software iSCSI initiator. The C drives for both VM's are stored on 1 VMFS LUN, and they both have a RDMs to seperate LUNs for data.
I'm using dual switches in between the server and the SAN, in a mesh configuration for redundancy. During failover testing, I pull the power cord for one of the switches while running a vmkping on the ESX host to measure how long failover takes (which is roughly 10 seconds) then check the VM's to ensure failover didn't cause them to crash... however the VM running exchange hangs and never responds (have to power off) whereas the other VM pauses for maybe 20 to 30 seconds, then resumes normally... I've checked both machines to ensure the TimeOutValue is set to 60 and even increased the value to 120 on the exchange server.... no luck
Hoping someone may have some ideas...
Thanks in advance
WHats the configuration of your vSwitch used to connect to iSCSI?
I would use 2 pNICs in active/standby with failback set to No, and Link Status set to Beacon Probing.
Also I've seen differing results between the vmkping response when network failsover to the response on the actual VM. Never got to the bottom of why, but its worth running your test to ping the iSCSI target from the VM
Thanks for the reply but I think I nabbed it... Turned out to be caused by the Dell Openmanage application (this vm was converted from an OEM box)
I disabled the services associated with app and did another failover test and this time the vm stayed up... I then uninstalled the ap completely and tested it a few more times and worked like a charm.
Thanks anyway for the help!