VMware Cloud Community
nsolop
Expert
Expert
Jump to solution

Recovery plan test failed - A file was not found error

Hi all,

Finally I got my SRM envrionment installed a few days ago and performed a little recovery plan test with 2 virtual machines successfully. Today we run a new test with 18 low priority vms and everything went smoothly untill after 15 minutes the process failed with the error "A file was not found" powering on the vms at the remote site.

Basically we have 1 one EMC Clariion at each site with mirrorview that were configured following EMC's VMware Site Recovery Manager with EMC CLARiiON CX3 and MirrorViewS Implementation Guide and SRM 1.0. We detected that the snashot created at the recovery Clarrion went to Active state right after the begining of the test but after 15 minutes later it went back to Inactive.

Further research has shown me that the error that the recovery plan test is giving me is correct since the snapshot is no longer active and all the files of the vms (VMX and VMDK) that the SRM is trying to power on are on the snapshot.

Have any of you seen something like this?.

Thanks in advance.

Saludos/Regards

Nicolas Solop

Buenos Aires, Argentina

-


Mi empresa

Mi perfil en LinkedIn

Grupo de Virtualizacion en español de Linkedin

-


0 Kudos
1 Solution

Accepted Solutions
bladeraptor
VMware Employee
VMware Employee
Jump to solution

Hi

Our best practices for configuring CLARiiON SnapView which takes copy on first write point in time snapshots of the production volume is to allocate between 20-30% of the production LUN for the RLP area LUN

So inthe case of your 300GB volume the Reserved LUN Pool LUN (and you may need more than one - i.e if you have a consistent snapshot that includes several LUNs it will add ot the space required) - you should provision at least a 30GB if not a 60GB LUN in your Reserved LUN Pool area

I would try increasing the RLP LUN to that sort of size and trying again

Regards

Alex Tanner

View solution in original post

0 Kudos
7 Replies
bladeraptor
VMware Employee
VMware Employee
Jump to solution

Hi

I am writing this as an EMC employee

Can you confirm that all of these Virtual Machines are on a single LUN?

How big is this LUN in terms of capacity

How big is the Reserved LUN Pool LUN (RLP) you have allocated to hold the change data

When you run the SRM job and go into the 'View Events' menu option on the CLARiiON SPs, are there any error messages there?

Can you post the SRM logs - from the all users > application data > VMware > SRM > Logs folder?

The CLARiiON SRA should also have a log in the Program Files > VMware > SRM > Scripts > SAN > Mirrorview folder?

The paths above are relative and may be different for you

If you have the capacity I might try increasing the size of the RLP or adding addttional volumes to the pool

Many thanks

Alex Tanner

0 Kudos
CHogan
VMware Employee
VMware Employee
Jump to solution

Nicolas,

This doc may also help you - http://viops.vmware.com/home/docs/DOC-1227

Cormac

http://cormachogan.com
0 Kudos
nsolop
Expert
Expert
Jump to solution

Hello Cormac,

I have already reviewed the document that you point in your reply and let me tell you that part of the configuration that we perform last week was based on it. But right now it doesn't help me out with this.

Thanks for your time for answer my question!

Saludos/Regards

Nicolas Solop

Buenos Aires, Argentina

-


Mi empresa

Mi perfil en LinkedIn

Grupo de Virtualizacion en español de Linkedin

-


0 Kudos
nsolop
Expert
Expert
Jump to solution

Hello Alex,

All virtual machines are on the same lun/vmfs datastore which is 300GB of size. I'm not the storage administrator so please be kind with me. Early this morning spoke with the storage admin and he told me that he had placed two 5GB private luns (I hope that this are the Reserved Lun Pool) that you talk about at your reply.

Last week when we performed the small test with less vms the two luns were of 200MB each and when we performed the test last night the storage admin increased the size to 5GB each. Do you think that the root cause of this error is related to this reserved luns? if so I'll ask for bigger luns and retry the test.

Since I don't have access to the servers right now I'm not able to upload the logs. Will try to do it tomorrow morning.

Thanks again.

Saludos/Regards

Nicolas Solop

Buenos Aires, Argentina

-


Mi empresa

Mi perfil en LinkedIn

Grupo de Virtualizacion en español de Linkedin

-


0 Kudos
bladeraptor
VMware Employee
VMware Employee
Jump to solution

Hi

Our best practices for configuring CLARiiON SnapView which takes copy on first write point in time snapshots of the production volume is to allocate between 20-30% of the production LUN for the RLP area LUN

So inthe case of your 300GB volume the Reserved LUN Pool LUN (and you may need more than one - i.e if you have a consistent snapshot that includes several LUNs it will add ot the space required) - you should provision at least a 30GB if not a 60GB LUN in your Reserved LUN Pool area

I would try increasing the RLP LUN to that sort of size and trying again

Regards

Alex Tanner

0 Kudos
nsolop
Expert
Expert
Jump to solution

Hi Alex, asked the storage administrator for bigger luns. Hope this configuration will be available tomorrow in order to perform the tests again.

Let you know the results.

Thanks again.

Saludos/Regards

Nicolas Solop

Buenos Aires, Argentina

-


Mi empresa

Mi perfil en LinkedIn

Grupo de Virtualizacion en español de Linkedin

-


0 Kudos
nsolop
Expert
Expert
Jump to solution

Alex, your solution worked like a champ.

Thank you.

Saludos/Regards

Nicolas Solop

Buenos Aires, Argentina

-


Mi empresa

Mi perfil en LinkedIn

Grupo de Virtualizacion en español de Linkedin

-


0 Kudos