Hi ,
we have problems sometimes in clusters in box windows 2003 , with errors in eventlog of cluster, like :
Clussvc.Reservation of cluster disk 'Disk F:' has been lost
Ftdisk.The system failed to flush data to the transaction log
Delayed Write Failed : Windows was unable to save all the data for the file H:\sfbd\bds\MSSQL\LOG\ERRORLOG
This provide problems en cluster , like conmutation groups , corrupcion SQL Server BD , etc .
We have Dmx 3 Symmetric and Esx 3.0.1 with last patches .
I see parameters in disk , driver 4 Gb hba Qlogic , etc but problem continue.
Some idea ? someones problem similar ?
Thanks .
Hello,
What type of disk is Disk F? RDM? vmdk? Which type of bus sharing are you using on your VM vSCSI controller?
Best regards,
Edward
Hello ,
disks are type vmdk ( no Rdm ) and type scsi controller is lsi logic ( default ) , and type bus sharing of Scsi controller 1 is virtual ( scsi controller 0 is None ) .
Thanks .
Regards .
someone have any idea ? Perhaps we put a question to support Vmware.
Thanks
Regards
Hi,
We have the same problem with ESX 3.0.2. Any update?
Any update available for this issue?
Thanks,
Brian
<*(((>< er
You say "cluster in box", but how many ESX servers are involved in this? Also, how many Windows nodes in the MSCS cluster?
Good Afternoon,
Did you zero the disks after you had created them?
Regards,
Dave.
Hello,
VI3 requires that ANY shared drive be an RDM. There is no solution other than to move to an RDM type device. You have to read the http://www.vmware.com/pdf/vi3_301_201_mscs.pdf document very carefully. It does not come right out and say this. If you only have one ESX server and no remote data stores you can setup a SCSI Generic device to get the same capability. Note this is true for RedHat Shared disk clusters as well as MSCS.
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education. As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization
I'd have to disagree with that statement, Edward. It's quite possible to do a shared disk for a true (single-host) cluster-in-a-box. In fact, the documentation is quite clear that for a single-node system, this is how you should go about it.
There is no requirement for shared disk resources to be RDMs. There are several documents that actively promote RDMs over the old 2.x RAW devices (aka generic SCSI devices), but that was because back in the early 2.x days RAW devices were the order of the day - RDMs didn't exist. All that of course changed with the introduction of 2.5.
To the best of my knowledge it's not possible to use a generic SCSI disk device on the local system in ESX 3. I've seen posts here where that was attempted and in most cases the guy trying it couldn't get it to work, even on a separate SCSI controller. The conclusion has usually been that RDMs can only be had on a SAN, though I've not seen official documentation on that either.
Josito,
I'm having the same errors, although not all the errors are necessarily a problem.
My VI includes ESX 3.0.2 servers, with VC recently upgraded to 2.5. We have a number of W2K and W2K3 Clusters, some native to ESX 3.x.x and others migrated over from our previous 2.5.x hosts.
All our clusters are 'Cluster in a Box' scenarios; shared disks are VMDKs with the dedciated SCSI controller set to 'Virtual'. With the move to VI3, we also disabled DRS for the cluster nodes, just in case. Some of the scenarios I've described (e.g. W2K) aren't supported by VMware but annecdotal evidence, from here and elsewhere, suggests that in most cases, it works fine. None of the disks were zeroed - I'm not sure that this was even an option under 2.5.x (where most of my clusters were originally built).
All our clusters suffer from FTDisk (Event ID 57) and Delayed Write Failed (Event ID 50). We thought this really was a SAN / LUN problem and so created small dedciated LUNs for cluster nodes (rather than sharing bigger LUNs with other VMs) - none of which made any difference.
Now for the good news! See: - according to MS, there is no Resolution because this is known behaviour and doesn't mean there is a problem.
Now for the bad news
Since moving to ESX 3.0.2 (Build 61618), we've been following the instructions for W2K3 Clusters that are in the (increasingly out of date) white paper from VMware (as per the post by Teximill). Now, in addition to the (harmless) errors 50 and 57, we are now getting the same 'Reservation failures that you've noticed. Unlike the 50's and 57's, the reservation failures lead to cluster service failure (between nodes) and eventually, the service stops. It will restart, but the process seems to be never ending.
Apart from the move to 3.0.2, the only difference in the way we build Clusters is that we are now following the VM doc more closely, even when it's contradictory and unclear. So - local storage (for the VM) is on 'LocalStorage' for a host (not sure how VMware plan on making that work in 3i) whilst the shared disks go onto a SAN LUN (EVA 6K). The shared disks are created and zeroed and then added to the VMs.
I've just built a brand new cluster with an older W2K3 build that we have (to see if an MS hotfix in a more recent company server images/build had broken anything) - no joy, same reservation problems. I'm now going to repeat the exercise on my legacy 2.5.x server....
Is anyone else having just the 50 and 57 events (which do seem harmless)
Ian
Texiwill,
I'm loathe to disagree with a published author, particularly one with 2,650 more posts in these forums than I have... but :smileyblush:
Page 15 says '...using either a virtual disk or a remote LUN using ...(RDM)...' and Page 16, table 1-1 - Shared Storage Summary shows that 'Virtual disks' are allowed for the 'Cluster in a Box' scenario. Or am I missing something?
We recently migrated WS2K3 VM's from ESX 2.x hosts to new ESX 3.0.2 hosts. Two of the VMs we migrated were part of a Microsoft cluster. They were working fine on the old host systems. Now we're getting the " Reservation of cluster disk... has been lost" errors. I noticed we're also getting occasional cluster communication lost errors too, but not nearly as frequently as the disk problems. The shared virtual disks were on locally attached storage and the VM's were hosted by the same parent server in the old environment. The new environment has 5 hosts servers in the cluster, and the datastores are on SAN.
In searching for a resolution, I found this post, but I also found this KB document: "Insight Manager causes excessive SCSI reservation conflicts" (1003534) and I wondered if it might be related. All of our VMWare host systems are HP and have 7.8 agents installed. Is anyone else experiencing this issue using HP hardware?
For reference, our VI is based on:
HP DL 380|385 and 580|585 servers, single QLogic HBA going through a Brocade switch into an HP EVA6k.
We don't currently have the HP Insight Agents installed.
Any updates to this ???
I'm having the same issue. With our Clustered disks having read/write errors. I was able to pin point the issue to the SCSI Controller, when in none shared mode works fine but as soon it is in virtual or physical mode the disk will have i/o errors. I've contacted vmware and been told it must be a bug and they are looking into it, but really need a fix as quick as possible.
Gu5,
We've just finished our upgrade to 3.5.0, which I believe is now the only VMware supported platform for MSCS. However, we had to find a solution quickly and that appears to be to host the VM cluster on a single host and ensure that no other hosts can access the LUN that holds the shared VMDKs. When we were doing our testing (on VMware 3.0.x) this was the only way we could get the cluster nodes to behave properly without any disk errors. This was acceptable for us in the short term (with Dev/Test environments), but it's hardly a solution for production systems.
Now that we've got to 3.5.0, we'll re-run our tests to see if we can get it to run in a stable error free manner, with multiple hosts - letting us have cluster nodes on different hosts.
We are encountering similar errors:
-
start error message -
Event Type: Error
Event Source: ClusSvc
Event Category: Physical Disk Resource
Event ID: 1038
Date: 08/21/2008
Time: 3:43:12 PM
User: N/A
Computer:
Description:
Reservation of cluster disk 'Disk S:' has been lost. Please check your system and disk configuration.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
-
end error message -
We have a dedicated stand alone host (ESX Server 3.0.2 Update 1 on IBM x3650) not part of a VMware Cluster for virtual machines running MSCS. These virtual machines were migrated from an ESX Server 2.5.x host using VMware Converter 3.0.2 and these problems did not occur on the older host. The virtual machines and shared virtual disks are all on the same SAN LUN (EMC CLARiiON).
Any updates would be creatly appreciated.
I too am having the same issues, running update 3. Are you running update 3 ? Did you get the problem all sorted out? How did you testing work out? Any updates and or feedback that you have would be great!
Thanks....
Dan
Same issue here running a CiB setup. Has anyone contact VMWare yet?
Hello,
Moved to Virtual Machine and Guest OS forum.
Best regards,
Edward L. Haletky
VMware Communities User Moderator, VMware vExpert 2009
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
Blue Gears and SearchVMware Pro Blogs -- Top Virtualization Security Links -- Virtualization Security Round Table Podcast
