We've been seeing several reservation errors in our ESX
environment.
Environment details:
IBM Bladecenter with LS20 blades, QLogic QLA2xxx 2gb adapters
Brocade fiber switch module
Connected to IBM SAN Volume Controller (SVC) ver 4.2.1.5
Running SVC Global Mirror to another site
10 Nodes running ESX 3.5.0 update 2
12 LUNs assigned to the cluster
We don't see any fabric events on the switches. SVC is not
showing any errors as well. The VMWAre ESX logs show numerous
reservation errors on the LUNs and a few instances of VMs getting
corrupted have occured.
Any tuning parameter in the Qlogic HBAs that might help?
Anyone out there with similar experience?
Thanks in advance!
Sample log entries below:
vmkernel:Feb 7 23:57:01 vmesx09 vmkernel: 0:03:26:09.775
cpu3:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel:Feb 7 23:57:06 vmesx09 vmkernel: 0:03:26:15.371
cpu3:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel:Feb 7 23:57:12 vmesx09 vmkernel: 0:03:26:21.153
cpu3:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel:Feb 7 23:57:18 vmesx09 vmkernel: 0:03:26:27.025
cpu3:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel.2:Feb 4 17:12:32 vmesx09 vmkernel: 1:23:07:39.378
cpu2:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel.2:Feb 4 17:12:38 vmesx09 vmkernel: 1:23:07:45.362
cpu0:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel.2:Feb 4 21:57:40 vmesx09 vmkernel: 2:03:52:48.163
cpu0:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel.2:Feb 5 01:27:37 vmesx09 vmkernel: 2:07:22:44.686
cpu3:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel.2:Feb 5 02:42:49 vmesx09 vmkernel: 2:08:37:57.252
cpu2:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
vmkernel.2:Feb 5 02:57:35 vmesx09 vmkernel: 2:08:52:42.747
cpu0:1039)WARNING: FS3: 4785: Reservation error: SCSI reservation
conflict
Hi,
There are multiple parameters influencing this kind of behaviour ...
For starter:
how big are your LUN's + how many VM's per LUN ?
Do you use VCB to backup at the time of the problem?
Fact is having
too many VM's all attempting to write to the same LUN from different hosts will
result in reservation conflicts...
Try to avoid placing High I/O VM's on the same LUN...
Peter
along with what Peter said, just a couple other things. One would be to check the firmware of your Qlogic HBA's. Also, probably the most important, I think you should to take a look at going to U3 and a couple of critical U3 Qlogic patches for ESX
Just to add to the helpful posts above, if you have any monitoring software (hardware vendor agents?) that might look at storage or storage HBA's for health, untilization or performance, you might want to turn those off. They're a classic cause as well.
Hello,
Check out an Excerpt from my book that covers many of the causes of SCSI Reservation requests. There is quite a bit happening under the covers that also falls into these categories.
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
Blue Gears and SearchVMware Pro Blogs -- Top Virtualization Security Links -- Virtualization Security Round Table Podcast
We were on U3 then back leveled to U2 because U3 is not listed yet in the SVC's support matrix.
We have upgraded the Qlogic BIOS from 1.43 to 1.47 over the weekend and when turned Global Mirror back on, we immediately saw the errors on a couple of servers again.
--Efren
Hello,
SCSI Reservation Conflicts can be related to your hardware but are generally not unless you have overloaded your SAN.
SCSI Reservation Conflicts occur when VMFS Metadata updates can not complete in the required time. Metadata updates happen almost all the time, and the excerpt will help you diagnose ones that could be related to what you or your admins may be doing as well as anything that could be happening under the covers. For example, if you are logging alot of VM data you will get a reservation every 15MBs of allocated VMFS space.
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
Blue Gears and SearchVMware Pro Blogs -- Top Virtualization Security Links -- Virtualization Security Round Table Podcast
In lay man terms -
Usually the reservation conflict is caused because the storage or LUN holding the reservation is unable to release the reservation for the next host causing a reservation conflict.
To get out of this situation we need to perform a LUN reset, so the reservations are cleared out or perform a rolling reboot of the hosts.
Hope this simplifies it.