VMware Cloud Community
Stefano_Giulian
Contributor
Contributor

Oracle 11g RAC on ESX 3.5 "cluster in a box": reservation conflict, reboot

We created 2 Linux RedHat 5 virtual machines on the same ESX 3.5 and installed Oracle 11g RAC.

Boot disk of every machine is in local storage of the ESX server, while shared disks are on a VMFS on SAN (EMC CX3-10).

The shared disks are on dedicated SCSI controller set to "virtual bus sharing".

Installation was ok, and the RAC seems to work, but sometimes a machine reboot unexpectedly.

From the /var/log/messages I see:

Aug 25 06:38:33 ttsc-rac2 kernel: sd 1:0:1:0: reservation conflict

Aug 25 06:38:33 ttsc-rac2 kernel: sd 1:0:1:0: SCSI error: return code = 0x00000018

Aug 25 06:38:33 ttsc-rac2 kernel: end_request: I/O error, dev sdd, sector 81

Aug 25 06:38:33 ttsc-rac2 logger: Oracle clsomon failed with fatal status 12.

Aug 25 06:38:33 ttsc-rac2 logger: Oracle CSSD failure 134.

Aug 25 06:38:33 ttsc-rac2 logger: Oracle CRS failure. Rebooting for cluster integrity.

/dev/sdd is the voting disk.

On Oracle log (/u01/app/crs/11.1.0/crs/log/ttsc-rac1/alertttsc-rac1.log) I see a lot of lines saying that "voting file is offline", but not time related with the error above. After many errors the voting disk returns online.

Any idea?

Thanks,

Stefano

Reply
0 Kudos
3 Replies
Stefano_Giulian
Contributor
Contributor

Update: we moved the shared virtual disks from VMFS to RDM with compatibility mode=physical, and now the problem is solved, the RAC cluster works properly.

S>

Reply
0 Kudos
bhoros
Contributor
Contributor

Hey,

Funny thing was when we first setup our it did with with the RMD and then a few months back when preparing for go-live I move these test servers to use the vmdk's. We never had this reboot issue with the RDM's and now we have it with the VMDK setup.

My question is how has it been running with the RDM's? any tips or pointers?

This is the first time I have found anyone doing what we were

2 node physical rhle5.3 cluster with GFS and then running RAC 11g on top.

We mirror that in ESX using manual fence node? - Have you tried the VMfenceing

Just curious how things are working for you

Reply
0 Kudos
phimic
Contributor
Contributor

Hello Community,

i have exactly the same problem on VMware ESX 3.5U4 with Oracle-RAC 10.2.0.1 running on SLES-10 SP3. I use a second LSI adapter (1:0) type "virtual" on both nodes. All nodes can access the shared raw partitions but after the CRS is running both system reseting. Here are the last lines of my

/var/log/messages of both cluster

Nov 25 11:39:42 rac1 kernel: sd 1:0:1:0: reservation conflict

Nov 25 11:39:42 rac1 kernel: sd 1:0:1:0: SCSI error: return code = 0x00000018

Nov 25 11:39:42 rac1 kernel: end_request: I/O error, dev sdc, sector 49

Nov 25 11:40:16 rac1 logger: Oracle CSSD failure. Rebooting for cluster integrity.

sdc ist the Oracle-RAC voting-disk. Any help would be appreciated.

Reply
0 Kudos