vSphere 4.1 RDM machines failover/back error with ...

Alwahidi · ‎03-08-2011

hello all,

Here is the situation:

ESX 4.1 hosting a couple of VMs with only VMFS vDisks and VMs with VMFS disks for OS and application files and vRDM disks for data.

half of the RDM machines vmx and vmdk files are created on one datastore (1 MB block size), name it Datastore-A, and the other half are created on another datastore with the same block size (Datastore-B). this is due to segregation requirement.

The setup is a recent upgrade from VI 3.5. one of the early problems I ran into was snapshotting the RDM VMs. according to a vmware fix, I created a new datastore with 8 MB block size and at which I configured the working directory of those RDM VMs. the snapshots went OK. but then, when I started running the recovery plan at the installed SRM 4.1, all VMs with vDisks only failed over and back successfully, in addition to all RDM machines at Datastore-A, but Datastore-B RDM VMs failed. I had a look at the array config of SRM and noticed that the new datastore i created is only associated with the Datastore-A but not B, though all three datastores were replicated successfully. then I reviewed the settings for the working directory of each RDM machine, and to my surprise, only one VM had the setting right, but the rest were having their working directory set to the Placeholders datastore created at the DR site.

I suspected the snapshot fix, so I removed the line from my RDM vmx files. another failover displayed even a more alarming error that it could find any RDM devices to mount. I almost lost my production data trying to recovery and failback to the original state.

I now suspect that SRM failover/back is somehow analogous to VM snapshots in estimating the total amd max file sizes before proceeding with the operation. Being so, will it work for me if I create a 4th datastore with 8MB block size for the failing VMs, or even better, will it work if I Storage vMotion all my RDM machines to two new datastored with 8M block size?

Thanks.

Abdul

Abdul M. Alwahidi

mal_michael · ‎03-09-2011

Hi,

I now suspect that SRM failover/back is somehow analogous to VM snapshots in estimating the total amd max file sizes before proceeding with the operation.

I don't think this is related to SRM.

For sure moving to new datastores with 8MB block size will be the best solution. This will simplify the setup and management, cause dealing with parameters for each VM is not manageable. Additionally, in SRM environments VMware recommends to put all VM's files in same folder, so changing the working directory location is not considered the best practice.

Two general recommendations:

1) update virtual hardware and VMTools

2) when you storage vMotion VMs with RDMs, on "Disk Format" screen you must choose "same as source", otherwise RDMs will be converted to VMDKs.

Michael.

Alwahidi · ‎03-09-2011

Hello Michael,

What I meant by analogy between failing over using SRM and taking a snapshot of a VM is that both operations are somehow related to estimating the max file size currently present at the VM folder. and this is specific to vSphere 4.1 according to VMware with relation to snapshots. i dont know if SRM follows the same rule.

anyway, another issue happened with me yesterday before I storage vMotioned the RDM VM to the new 8M block size datastores.while the VM was still on its old 1M datastore, i tried to add a new RDM disk with the size of 500 GB, but the operation was refused becuase the max file size (500 GB) is not supported (256 GB). which is normal. but the strange thing is that during the whole upgrade process of that VM for example, it never showed this message although the VM was holding 9 RDM disks with 2 TB each.

what i can understand about this is that old VM setup will be maintained during the upgrade no matter what, but if new operations such as snapshotting, adding new vDisks, or even failing over using SRM, the operation will not be supported unless you move to a datastore with bigger block size.

Anyway, today is my schedlued date to try the failover/failback, I'll keep you posted of the results I get.

Thanks.

Abdul

Abdul M. Alwahidi

mal_michael · ‎03-09-2011

18TB - wow! Pretty big VM! Don't even try to Storage vMotion it. Shutdown the VM, disconnect the RDMs (write down SCSI ID of each disk), perform cold migration, then reconnect the RDMs. If you don't disconnect the RDMs during cold migration, they may be converted to VMDKs.

Michael.

Alwahidi · ‎03-11-2011

I see.

I had a look at the VM folder before migrating to 4.1 and I noticed it cintained 9 files with 2 TB size each. i thought this was normal. is it? because if it is not, according to you Michael they could have been converted from RDM to VMFS way before I do the migration.

Abdul M. Alwahidi

mal_michael · ‎03-11-2011

Hi,

If you are looking at folder's content via datastore browser it is normal to see VMDK file which has the same size as RDM. That is simply a mapping file, it is not taking the space really.

If you open "edit settings" on your VM, you can easily see which hardisks are vmdks and which are RDMs.

Michael.

All

vSphere 4.1 RDM machines failover/back error with SRM 4.1