Hello all,
we are experiencing serious SCSI reservation issues in our ESX 3.0.1 / VC 2.0.1 environment.
This is our setup and the whole story:
Host hardware:
\- 2 IBM xSeries 445 (each with 8 SingleCore-CPUs and 32 GB RAM)
\- 3 HP ProLiant DL585 (each with 4 DualCore-CPUs and 32 GB RAM)
\- 2 HP ProLiant DL580 (each with 4 SingleCore-CPUs and 16 GB RAM)
We started with all servers running ESX 2.5.x attached to a EMC Symmetrix 8530. All servers used three 600 GB LUns on this box. All have two QLogic HBAs in them. No issues.
Then we started our migration to ESX3. At the same time we also needed to migrate to new SAN storage: six 400 GB LUNs on a HP XP12000. We used the brand new "VMotion with storage relocation"-feature to do both migrations. At the beginning this worked really fine.
So we re-installed all hosts one after the other with ESX3, attached the new storage LUNs to them (in addition to the old ones) and migrated the VMs from the not-yet-upgraded hosts to the already-upgraded hosts and the new storage.
We started with the three DL585 and were very pleased with the speed an the reliability of the process.
However, when we re-installed the first IBM-host the trouble began. All sorts of VM related procedures (e.g. storage relocation, hot and cold, powering on VMs, VMotion, create new VM) failed with all sorts of error messages in VirtualCenter. Looking at the vmkernel-logs of the hosts we discovered the reason for this: excessive SCSI reservation conflicts. The messages look like this e.g.:
Nov 14 13:29:43 frasvmhst06 vmkernel: 0:00:03:34.249 cpu4:1045)WARNING: SCSI: 5519: Failing I/O due to too many reservation conflicts
Nov 14 13:29:43 frasvmhst06 vmkernel: 0:00:03:34.249 cpu4:1045)WARNING: SCSI: 5615: status SCSI reservation conflict, rstatus 0xc0de01 for vmhba2:0:0. residual R 919, CR 0, ER 3
Nov 14 13:29:43 frasvmhst06 vmkernel: 0:00:03:39.086 cpu4:1045)FSS: 343: Failed with status 0xbad0022 for f530 28 2 453782fc 6b8bc9e9 1700770d 1d624ca 4 4 1 0 0 0 0 0
Things we have tried so far to make it better:
\- filed a SR with VMware. No helpful answers yet.
\- checked the firmware code of the XP12000. It is the latest: 50.07.64.
\- distributed SAN load on the two HBAs in each host (Three LUNs fixed on first path, the other three fixed on
the second). This helped a lot(!), but we still had frequent reservation conflicts.
\- updated all HBAs to the latest EMC-supported BIOS (version 1.47). Did not change anything.
\- doubled the HBA's queue depth to 64. Doesn't seem to help.
In the meantime we have updated all seven hosts and migrated all 124 VMs to the new storage. The old EMC-storage is still connected to all hosts but is unused. We even unloaded the VMFS2-driver like advised somewhere in the
SAN configuration guide. So, everything should be quiet now. However, we still see sporadic SCSI reservation conflicts, although there is no storage relocation or VMotion etc. in progress! Even if we just reboot a host it will generate these errors when initializing its SAN storage access.
What's wrong here? Are we already driving VMware to its limits by having 7 hosts accessing 6 LUNs concurrently?
Is it the IBM hardware? Is it ESX3 not properly releasing SCSI locks?
I'd love to read comments from people that have similar problems with maybe even similar hardware configurations or better: no issues with a similar hardware configuration (esp. the IBM hosts accessing a XP12000).
\- Andreas
Hi,
Similary to you we have Sun9990 and our servers are Dell 6850 and Dell 2950. For a three monts all works very well, but now we have very similar problem to you (described in your message from Jan 23 2007.
I suppose that you have solved your problems.
Could you inform me what did you exactly do?
Andrzej
We've been able to make the change to have Option 19 available but have not tested it yet. We're installing two new ESX servers and will test with them before we attempt the production servers. Once this is successful we will post what we had to do. If you contact your SUN or HDS SE and reference this article: http://kb.vmware.com/vmtnkb/search.do?cmd=displayKC&docType=kc&externalId=3408142&sliceId=SAL_Public..., they should be able to assist you. Reference the HDS alert number.
Okay, since we've made the change to our SAN and set up the new ESX servers we've had no errors. We've been performing tests such as VMotioning every few minutes, cloning VMs, shutting down and powering up and have had no issues. On the existing ESX servers without the change we're lucky if we can power a VM off without a problem. We'll be testing the Consolidated Backup next week before we can say that we're satisfied that this fixed our issue. Once we do that we have to move all of our VMs from the old servers to the new ones and then format and reinstall the old ones. Hope this helps!
Hi all
Any news on this issue?
I'm going to deploy a big infrastructure on a HDS 9990 and one XP12000 and I want to know if there are new patches/workaround for this issue.
Thank you.
Fabio.
Hi Fabio,
this specific issue (SCSI reservation conflicts using LUSE LUNs on HDS based SAN arrays) is resolved, a fix/workaround is documented here:
http://kb.vmware.com/selfservice/viewContent.do?externalId=8411304
My advice is: Avoid using LUSE (Logical Unit size expansion). If you can not
avoid it, follow the KB article above.
Regards
Andreas
Issue resolved by VMware. See:
http://kb.vmware.com/selfservice/viewContent.do?externalId=8411304
Please note that this is not a general fix
for all SCSI reservation problems but only
for the configurations that are mentioned
there (HDS based array like the XP12000 AND
using LUSE).
Thank you very much, I will keep this kb handy when I'll do the deploy
Have a nice day.
Fabio
Hello,
Are you familiar with similar symptoms on an EVA8000 based SAN ? I experience SCSI reservations errors with VCB. When i start a bunch of them in parallel (which the integration modules do) then I get a mess. If i start them one after the other, there are no conflicts. I am running all on test systems, so load is very very low, I have an HP EVA 8000 with all relevant firmware stuff (VCS 6) and have 2 DL 580G2's with Qlogic adapters (the 2Gb ones). My VCB proxy host has an Emulex card. Could this be the problem ?
BR,
Ronald
we have made the change to host mode 19 , and still ahve problems with reservation conflict message...the cluster is very very slow..
some hosts are freezing.. (seems to be because of the slow I/O)
some files i just cant delete from the SAN
\[root@s6004vm06 SAN01]# rm -rf teste/
rm: cannot remove `teste//pleiade.petrobras.biz.vmx': Resource deadlock avoided
rm: cannot remove `teste//vmware-2.log': Resource deadlock avoided
rm: cannot remove `teste//pleiade.petrobras.biz-049c57e3.hlog ': Resource deadlock avoided
i really dunno what to do anymore......opened SR nothing....now im opening a case at hitachi ..
Hi all
Any news on this topic?
Have you received any updates on firmware from Hp/Hitachi?
Thanks a lot.
No, there are no firmware updates available.
We did not experience any more issues since we implemented the option 19 thing.
So, I guess in your case there is something more or different wrong.
If you are still able to use VMotion then try to reboot each ESX host one after the other.
\- Andreas
Hi all
I'm seeing this error while doing a VCB Backup:
SanMpAIOMgrRWv: Too many SCSI reservation conflicts.
My setup up is:
ESX 3.0.2
VCB 1.0.3
two Xp12000 (patched and configured as per the kb you addressed)
Is anyone experiencing this problem?
Thank you.
F.