We created 2 Linux RedHat 5 virtual machines on the same ESX 3.5 and installed Oracle 11g RAC.
Boot disk of every machine is in local storage of the ESX server, while shared disks are on a VMFS on SAN (EMC CX3-10).
The shared disks are on dedicated SCSI controller set to "virtual bus sharing".
Installation was ok, and the RAC seems to work, but sometimes a machine reboot unexpectedly.
From the /var/log/messages I see:
Aug 25 06:38:33 ttsc-rac2 kernel: sd 1:0:1:0: reservation conflict
Aug 25 06:38:33 ttsc-rac2 kernel: sd 1:0:1:0: SCSI error: return code = 0x00000018
Aug 25 06:38:33 ttsc-rac2 kernel: end_request: I/O error, dev sdd, sector 81
Aug 25 06:38:33 ttsc-rac2 logger: Oracle clsomon failed with fatal status 12.
Aug 25 06:38:33 ttsc-rac2 logger: Oracle CSSD failure 134.
Aug 25 06:38:33 ttsc-rac2 logger: Oracle CRS failure. Rebooting for cluster integrity.
/dev/sdd is the voting disk.
On Oracle log (/u01/app/crs/11.1.0/crs/log/ttsc-rac1/alertttsc-rac1.log) I see a lot of lines saying that "voting file is offline", but not time related with the error above. After many errors the voting disk returns online.
Any idea?
Thanks,
Stefano
Update: we moved the shared virtual disks from VMFS to RDM with compatibility mode=physical, and now the problem is solved, the RAC cluster works properly.
S>
Hey,
Funny thing was when we first setup our it did with with the RMD and then a few months back when preparing for go-live I move these test servers to use the vmdk's. We never had this reboot issue with the RDM's and now we have it with the VMDK setup.
My question is how has it been running with the RDM's? any tips or pointers?
This is the first time I have found anyone doing what we were
2 node physical rhle5.3 cluster with GFS and then running RAC 11g on top.
We mirror that in ESX using manual fence node? - Have you tried the VMfenceing
Just curious how things are working for you
Hello Community,
i have exactly the same problem on VMware ESX 3.5U4 with Oracle-RAC 10.2.0.1 running on SLES-10 SP3. I use a second LSI adapter (1:0) type "virtual" on both nodes. All nodes can access the shared raw partitions but after the CRS is running both system reseting. Here are the last lines of my
/var/log/messages of both cluster
Nov 25 11:39:42 rac1 kernel: sd 1:0:1:0: reservation conflict
Nov 25 11:39:42 rac1 kernel: sd 1:0:1:0: SCSI error: return code = 0x00000018
Nov 25 11:39:42 rac1 kernel: end_request: I/O error, dev sdc, sector 49
Nov 25 11:40:16 rac1 logger: Oracle CSSD failure. Rebooting for cluster integrity.
sdc ist the Oracle-RAC voting-disk. Any help would be appreciated.