VMware Cloud Community
Josito
Contributor
Contributor

Error disk reservation . cluster in box

Hi ,

we have problems sometimes in clusters in box windows 2003 , with errors in eventlog of cluster, like :

Clussvc.Reservation of cluster disk 'Disk F:' has been lost

Ftdisk.The system failed to flush data to the transaction log

Delayed Write Failed : Windows was unable to save all the data for the file H:\sfbd\bds\MSSQL\LOG\ERRORLOG

This provide problems en cluster , like conmutation groups , corrupcion SQL Server BD , etc .

We have Dmx 3 Symmetric and Esx 3.0.1 with last patches .

I see parameters in disk , driver 4 Gb hba Qlogic , etc but problem continue.

Some idea ? someones problem similar ?

Thanks .

Reply
0 Kudos
24 Replies
Texiwill
Leadership
Leadership

Hello,

What type of disk is Disk F? RDM? vmdk? Which type of bus sharing are you using on your VM vSCSI controller?

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
Reply
0 Kudos
Josito
Contributor
Contributor

Hello ,

disks are type vmdk ( no Rdm ) and type scsi controller is lsi logic ( default ) , and type bus sharing of Scsi controller 1 is virtual ( scsi controller 0 is None ) .

Thanks .

Regards .

Reply
0 Kudos
Josito
Contributor
Contributor

someone have any idea ? Perhaps we put a question to support Vmware.

Thanks

Regards

Reply
0 Kudos
rluaces
Contributor
Contributor

Hi,

We have the same problem with ESX 3.0.2. Any update?

Reply
0 Kudos
MrHPUX
Contributor
Contributor

Any update available for this issue?

Thanks,

Brian

<*(((>< er

Reply
0 Kudos
jhanekom
Virtuoso
Virtuoso

You say "cluster in box", but how many ESX servers are involved in this? Also, how many Windows nodes in the MSCS cluster?

Reply
0 Kudos
IOWDave
Enthusiast
Enthusiast

Good Afternoon,

Did you zero the disks after you had created them?

Regards,

Dave.

Reply
0 Kudos
Texiwill
Leadership
Leadership

Hello,

VI3 requires that ANY shared drive be an RDM. There is no solution other than to move to an RDM type device. You have to read the http://www.vmware.com/pdf/vi3_301_201_mscs.pdf document very carefully. It does not come right out and say this. If you only have one ESX server and no remote data stores you can setup a SCSI Generic device to get the same capability. Note this is true for RedHat Shared disk clusters as well as MSCS.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
Reply
0 Kudos
jhanekom
Virtuoso
Virtuoso

I'd have to disagree with that statement, Edward. It's quite possible to do a shared disk for a true (single-host) cluster-in-a-box. In fact, the documentation is quite clear that for a single-node system, this is how you should go about it.

There is no requirement for shared disk resources to be RDMs. There are several documents that actively promote RDMs over the old 2.x RAW devices (aka generic SCSI devices), but that was because back in the early 2.x days RAW devices were the order of the day - RDMs didn't exist. All that of course changed with the introduction of 2.5.

To the best of my knowledge it's not possible to use a generic SCSI disk device on the local system in ESX 3. I've seen posts here where that was attempted and in most cases the guy trying it couldn't get it to work, even on a separate SCSI controller. The conclusion has usually been that RDMs can only be had on a SAN, though I've not seen official documentation on that either.

Reply
0 Kudos
Ian-Cumbers
Contributor
Contributor

Josito,

I'm having the same errors, although not all the errors are necessarily a problem.

My VI includes ESX 3.0.2 servers, with VC recently upgraded to 2.5. We have a number of W2K and W2K3 Clusters, some native to ESX 3.x.x and others migrated over from our previous 2.5.x hosts.

All our clusters are 'Cluster in a Box' scenarios; shared disks are VMDKs with the dedciated SCSI controller set to 'Virtual'. With the move to VI3, we also disabled DRS for the cluster nodes, just in case. Some of the scenarios I've described (e.g. W2K) aren't supported by VMware but annecdotal evidence, from here and elsewhere, suggests that in most cases, it works fine. None of the disks were zeroed - I'm not sure that this was even an option under 2.5.x (where most of my clusters were originally built).

All our clusters suffer from FTDisk (Event ID 57) and Delayed Write Failed (Event ID 50). We thought this really was a SAN / LUN problem and so created small dedciated LUNs for cluster nodes (rather than sharing bigger LUNs with other VMs) - none of which made any difference.

Now for the good news! See: - according to MS, there is no Resolution because this is known behaviour and doesn't mean there is a problem.

Now for the bad news Smiley Sad Since moving to ESX 3.0.2 (Build 61618), we've been following the instructions for W2K3 Clusters that are in the (increasingly out of date) white paper from VMware (as per the post by Teximill). Now, in addition to the (harmless) errors 50 and 57, we are now getting the same 'Reservation failures that you've noticed. Unlike the 50's and 57's, the reservation failures lead to cluster service failure (between nodes) and eventually, the service stops. It will restart, but the process seems to be never ending.

Apart from the move to 3.0.2, the only difference in the way we build Clusters is that we are now following the VM doc more closely, even when it's contradictory and unclear. So - local storage (for the VM) is on 'LocalStorage' for a host (not sure how VMware plan on making that work in 3i) whilst the shared disks go onto a SAN LUN (EVA 6K). The shared disks are created and zeroed and then added to the VMs.

I've just built a brand new cluster with an older W2K3 build that we have (to see if an MS hotfix in a more recent company server images/build had broken anything) - no joy, same reservation problems. I'm now going to repeat the exercise on my legacy 2.5.x server....

Is anyone else having just the 50 and 57 events (which do seem harmless)

Ian

Reply
0 Kudos
Ian-Cumbers
Contributor
Contributor

Texiwill,

I'm loathe to disagree with a published author, particularly one with 2,650 more posts in these forums than I have... but :smileyblush:

Page 15 says '...using either a virtual disk or a remote LUN using ...(RDM)...' and Page 16, table 1-1 - Shared Storage Summary shows that 'Virtual disks' are allowed for the 'Cluster in a Box' scenario. Or am I missing something?

Reply
0 Kudos
mdcarson
Contributor
Contributor

We recently migrated WS2K3 VM's from ESX 2.x hosts to new ESX 3.0.2 hosts. Two of the VMs we migrated were part of a Microsoft cluster. They were working fine on the old host systems. Now we're getting the " Reservation of cluster disk... has been lost" errors. I noticed we're also getting occasional cluster communication lost errors too, but not nearly as frequently as the disk problems. The shared virtual disks were on locally attached storage and the VM's were hosted by the same parent server in the old environment. The new environment has 5 hosts servers in the cluster, and the datastores are on SAN.

In searching for a resolution, I found this post, but I also found this KB document: "Insight Manager causes excessive SCSI reservation conflicts" (1003534) and I wondered if it might be related. All of our VMWare host systems are HP and have 7.8 agents installed. Is anyone else experiencing this issue using HP hardware?

Reply
0 Kudos
Ian-Cumbers
Contributor
Contributor

For reference, our VI is based on:

HP DL 380|385 and 580|585 servers, single QLogic HBA going through a Brocade switch into an HP EVA6k.

We don't currently have the HP Insight Agents installed.

Reply
0 Kudos
Gu5
Contributor
Contributor

Any updates to this ???

I'm having the same issue. With our Clustered disks having read/write errors. I was able to pin point the issue to the SCSI Controller, when in none shared mode works fine but as soon it is in virtual or physical mode the disk will have i/o errors. I've contacted vmware and been told it must be a bug and they are looking into it, but really need a fix as quick as possible.

Reply
0 Kudos
Ian-Cumbers
Contributor
Contributor

Gu5,

We've just finished our upgrade to 3.5.0, which I believe is now the only VMware supported platform for MSCS. However, we had to find a solution quickly and that appears to be to host the VM cluster on a single host and ensure that no other hosts can access the LUN that holds the shared VMDKs. When we were doing our testing (on VMware 3.0.x) this was the only way we could get the cluster nodes to behave properly without any disk errors. This was acceptable for us in the short term (with Dev/Test environments), but it's hardly a solution for production systems.

Now that we've got to 3.5.0, we'll re-run our tests to see if we can get it to run in a stable error free manner, with multiple hosts - letting us have cluster nodes on different hosts.

Reply
0 Kudos
aenagy
Hot Shot
Hot Shot

We are encountering similar errors:

-


start error message -


Event Type: Error

Event Source: ClusSvc

Event Category: Physical Disk Resource

Event ID: 1038

Date: 08/21/2008

Time: 3:43:12 PM

User: N/A

Computer:

Description:

Reservation of cluster disk 'Disk S:' has been lost. Please check your system and disk configuration.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

-


end error message -


We have a dedicated stand alone host (ESX Server 3.0.2 Update 1 on IBM x3650) not part of a VMware Cluster for virtual machines running MSCS. These virtual machines were migrated from an ESX Server 2.5.x host using VMware Converter 3.0.2 and these problems did not occur on the older host. The virtual machines and shared virtual disks are all on the same SAN LUN (EMC CLARiiON).

Any updates would be creatly appreciated.

Reply
0 Kudos
dandeane
Enthusiast
Enthusiast

I too am having the same issues, running update 3. Are you running update 3 ? Did you get the problem all sorted out? How did you testing work out? Any updates and or feedback that you have would be great!

Thanks....

Dan

Reply
0 Kudos
sdonia
Contributor
Contributor

Same issue here running a CiB setup. Has anyone contact VMWare yet?

Reply
0 Kudos
Texiwill
Leadership
Leadership

Hello,

Moved to Virtual Machine and Guest OS forum.


Best regards,
Edward L. Haletky
VMware Communities User Moderator, VMware vExpert 2009
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
Blue Gears and SearchVMware Pro Blogs -- Top Virtualization Security Links -- Virtualization Security Round Table Podcast

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
Reply
0 Kudos