VMware Cloud Community
kpedersen
Contributor
Contributor

New SRM 5.0 install -- unable to Create Protection Group

Hi folks,

Trying to get SRM 5.0 underway and for some reason, creation of Protection Group just hangs for hours at 57% with no status or group created.  Only way to clear it is to restart vCenter service, not SRM service.

I'm trying to protect a small datastore on a NetApp to the DR site.  Snapmirror is all configured, SRM and SRA's installed, mapped, paired and everthing else without too much issue. 

vCenter Server 5.0

ESX 4.1 U2 Hosts

SRM 5.0

NetApp FAS SRA 2.0

NetApp ONTAP 8.0

Any advice would be greatly appeciated.

Karl

Reply
0 Kudos
5 Replies
kpedersen
Contributor
Contributor

Log lines that I have been unable to find anything on that indicate error:

2011-12-22T22:27:35.302-05:00 [04848 verbose 'DrTask' opID=FEA28620-00000023] Created VC task 'com.vmware.vcDr.dr.replication.ReplicationManager.createProtectionGroup:task-2311'
2011-12-22T22:27:35.302-05:00 [00284 warning 'DrTask' opID=FEA28620-00000023] Failed to create 'com.vmware.vcDr.dr.replication.ReplicationManager.createProtectionGroup' task in VC: (dr.fault.ConnectionDownFault) {

What I beleive to be all the relevant log data on this job:

2011-12-22T22:27:35.177-05:00 [02304 verbose 'Replication' opID=FEA28620-00000023] Starting internal API method for Replication Manager 'DrReplicationManager': Dr::Replication::VmomiLayer::ReplicationManagerImpl::CreateProtectionGroup
2011-12-22T22:27:35.177-05:00 [02304 verbose 'StorageProvider' opID=FEA28620-00000023] CreateGroupCheckPrivilege: 'BOSNATop1_Rep_DS01',
[#1] --> (dr.storageProvider.GroupSettings) {
[#1] -->    dynamicType = <unset>,
[#1] -->    array = 'dr.storage.ReplicatedArrayPair:array-pair-4041',
[#1] -->    datastore = (vim.Datastore) [
[#1] -->       'vim.Datastore:datastore-37'
[#1] -->    ],
[#1] --> }
2011-12-22T22:27:35.177-05:00 [02304 verbose 'LocalVC' opID=FEA28620-00000023] [AuthorizationDomain] Processing '1' privilege queries.
2011-12-22T22:27:35.177-05:00 [02304 verbose 'LocalVC' opID=FEA28620-00000023] [AuthorizationDomain] Fetching managed entities on which to check priveleges
2011-12-22T22:27:35.177-05:00 [02304 info 'DrTask' opID=FEA28620-00000023] Starting task 'dr.replication.ReplicationManager.createProtectionGroup14'
2011-12-22T22:27:35.177-05:00 [05208 verbose 'DrTask' opID=FEA28620-00000023] Starting VC task 'com.vmware.vcDr.dr.replication.ReplicationManager.createProtectionGroup'
2011-12-22T22:27:35.177-05:00 [02304 verbose 'PropertyProvider' opID=FEA28620-00000023] RecordOp ASSIGN: info.progress, dr.replication.ReplicationManager.createProtectionGroup14
2011-12-22T22:27:35.177-05:00 [05208 verbose 'DrTask' opID=FEA28620-00000023] Starting VC task 'com.vmware.vcDr.dr.replication.ReplicationManager.createProtectionGroup'
2011-12-22T22:27:35.224-05:00 [04848 verbose 'LocalVC' opID=FEA28620-00000023] [AuthorizationDomain] Processing effective roles of managed entities to validate privileges.
2011-12-22T22:27:35.224-05:00 [04848 verbose 'PropertyProvider' opID=FEA28620-00000023] RecordOp ASSIGN: info.progress, dr.replication.ReplicationManager.createProtectionGroup14
2011-12-22T22:27:35.224-05:00 [04848 info 'MultiSiteOperationManager' opID=FEA28620-00000023] Starting multi-site operation 'dr.replication.createProtectionGroup', tag ''
2011-12-22T22:27:35.224-05:00 [04848 verbose 'MultiSiteOperationManager' opID=FEA28620-00000023] Multi-site operation coordinator 'coordinator-7022' created
2011-12-22T22:27:35.224-05:00 [04848 verbose 'PropertyProvider' opID=FEA28620-00000023] RecordOp ADD: coordinator["coordinator-7022"], MultiSiteOperationManager
2011-12-22T22:27:35.224-05:00 [04848 verbose 'MultiSiteOperationManager' opID=FEA28620-00000023] Cohort 'cohort-7023' created associated with coordinator 'coordinator-7022' on site 'bd49c809-ae35-4323-b3b1-d65fa5d89669'
2011-12-22T22:27:35.224-05:00 [04848 verbose 'PropertyProvider' opID=FEA28620-00000023] RecordOp ADD: cohort["cohort-7023"], MultiSiteOperationManager
2011-12-22T22:27:35.286-05:00 [04108 verbose 'PropertyProvider'] RecordOp ASSIGN: info.progress, dr.replication.ReplicationManager.createProtectionGroup14
2011-12-22T22:27:35.286-05:00 [04108 verbose 'RemoteDR' ctxID=769b8d07 opID=FEA28620-00000023] [PCM] Using token '6' for pending filter creation
2011-12-22T22:27:35.286-05:00 [04108 verbose 'PropertyProvider' ctxID=769b8d07 opID=FEA28620-00000023] RecordOp ASSIGN: prepare, coordinator-7022
2011-12-22T22:27:35.286-05:00 [04108 verbose 'PropertyProvider' ctxID=769b8d07 opID=FEA28620-00000023] RecordOp ASSIGN: prepareData, coordinator-7022
2011-12-22T22:27:35.286-05:00 [04108 verbose 'PerformanceMonitor' ctxID=7c7599a1 opID=FEA28620-00000023] Performance monitor Token 0 of lock PersistableRWLock-1009 begin.
2011-12-22T22:27:35.286-05:00 [04108 verbose 'PersistableRWLock' ctxID=7c7599a1 opID=FEA28620-00000023] Lock PersistableRWLock-1009' obtained for Read
2011-12-22T22:27:35.286-05:00 [04108 verbose 'Replication' opID=FEA28620-00000023] Adding reservation for group 'BOSNATop1_Rep_DS01' in folder '[dr.Folder:DrReplicationRootFolder]'
2011-12-22T22:27:35.286-05:00 [04108 verbose 'Default' opID=FEA28620-00000023] Reservation made for addition of key BOSNATop1_Rep_DS01
2011-12-22T22:27:35.286-05:00 [04108 info 'Replication' opID=FEA28620-00000023] Preparing the provider for the CreateProtectionGroup operation for group protection-group-7024.
2011-12-22T22:27:35.286-05:00 [04108 verbose 'StorageProvider' opID=FEA28620-00000023] CreateGroupPrepare: 'protection-group-7024', 'BOSNATop1_Rep_DS01'
--> (dr.storageProvider.GroupSettings) {
-->    dynamicType = <unset>,
-->    array = 'dr.storage.ReplicatedArrayPair:array-pair-4041',
-->    datastore = (vim.Datastore) [
-->       'vim.Datastore:datastore-37'
-->    ],
--> }
2011-12-22T22:27:35.286-05:00 [04108 verbose 'PerformanceMonitor' opID=FEA28620-00000023] Performance monitor Token 0 of lock PersistableRWLock-7004 begin.
2011-12-22T22:27:35.286-05:00 [04108 verbose 'PersistableRWLock' opID=FEA28620-00000023] Lock PersistableRWLock-7004' obtained for Read
2011-12-22T22:27:35.302-05:00 [04228 verbose 'Storage' opID=FEA28620-00000023] Acquiring read op lock on 'array-pair-4041'
2011-12-22T22:27:35.302-05:00 [04228 verbose 'PerformanceMonitor' opID=FEA28620-00000023] Performance monitor Token 0 of lock PersistableRWLock-4042 begin.
2011-12-22T22:27:35.302-05:00 [04228 verbose 'PersistableRWLock' opID=FEA28620-00000023] Lock PersistableRWLock-4042' obtained for Read
2011-12-22T22:27:35.302-05:00 [04228 verbose 'Storage' opID=FEA28620-00000023] Acquired read op lock on 'array-pair-4041'
2011-12-22T22:27:35.302-05:00 [04228 verbose 'Storage' opID=FEA28620-00000023] Added dependent 'protection-group-7024' to array pair 'array-pair-4041'
2011-12-22T22:27:35.302-05:00 [04228 verbose 'Storage' opID=FEA28620-00000023] Releasing read op lock on 'array-pair-4041'
2011-12-22T22:27:35.302-05:00 [04228 verbose 'PerformanceMonitor' opID=FEA28620-00000023] Performance monitor Token 0 of lock PersistableRWLock-4042. 'Locked' took 0 seconds
2011-12-22T22:27:35.302-05:00 [04228 verbose 'PersistableRWLock' opID=FEA28620-00000023] Releasing Read lock 'PersistableRWLock-4042'
2011-12-22T22:27:35.302-05:00 [04228 verbose 'PerformanceMonitor' opID=FEA28620-00000023] Performance monitor Token 0 of lock PersistableRWLock-7004. 'Locked' took 0.016 seconds
2011-12-22T22:27:35.302-05:00 [04228 verbose 'PersistableRWLock' opID=FEA28620-00000023] Releasing Read lock 'PersistableRWLock-7004'
2011-12-22T22:27:35.302-05:00 [04228 verbose 'PersistableRWLock'] Destroying NON persisted released token 0 of lock PersistableRWLock-4042
2011-12-22T22:27:35.302-05:00 [04228 verbose 'PersistableRWLock'] Destroying NON persisted released token 0 of lock PersistableRWLock-7004
2011-12-22T22:27:35.302-05:00 [04228 verbose 'Replication' opID=FEA28620-00000023] The provider successfully prepared for the CreateProtectionGroupoperation for group protection-group-7024.
2011-12-22T22:27:35.302-05:00 [04228 verbose 'Replication' opID=FEA28620-00000023] Persisting the prepare phase results in provider for creation of group protection-group-7024
2011-12-22T22:27:35.302-05:00 [04228 verbose 'StorageProvider' opID=FEA28620-00000023] CreateGroupPreparePersist: 'protection-group-7024', 'BOSNATop1_Rep_DS01'
--> (dr.storageProvider.GroupSettings) {
-->    dynamicType = <unset>,
-->    array = 'dr.storage.ReplicatedArrayPair:array-pair-4041',
-->    datastore = (vim.Datastore) [
-->       'vim.Datastore:datastore-37'
-->    ],
--> }
2011-12-22T22:27:35.302-05:00 [04848 verbose 'DrTask' opID=FEA28620-00000023] Created VC task 'com.vmware.vcDr.dr.replication.ReplicationManager.createProtectionGroup:task-2311'
2011-12-22T22:27:35.302-05:00 [00284 warning 'DrTask' opID=FEA28620-00000023] Failed to create 'com.vmware.vcDr.dr.replication.ReplicationManager.createProtectionGroup' task in VC: (dr.fault.ConnectionDownFault) {
-->    dynamicType = <unset>,
-->    faultCause = (vmodl.MethodFault) null,
-->    msg = "",
--> }
2011-12-22T22:27:35.302-05:00 [00284 warning 'DrTask' opID=FEA28620-00000023] Ignoring SetProgress because task does not exist
2011-12-22T22:27:35.302-05:00 [00284 warning 'DrTask' opID=FEA28620-00000023] Ignoring SetProgress because task does not exist
2011-12-22T22:27:35.302-05:00 [00284 warning 'DrTask'] Ignoring SetProgress because task does not exist
2011-12-22T22:27:35.317-05:00 [04228 verbose 'StorageProvider' opID=FEA28620-00000023] Created AcquiredDatastores DB object '7025' for local site.
2011-12-22T22:27:35.317-05:00 [04228 verbose 'StorageProvider' opID=FEA28620-00000023] Created ProtectionGroupInProgress DB object '7027' for ProtectionGroup 'protection-group-7024' and ReplicatedArrayPair 'array-pair-4041'.
2011-12-22T22:27:35.317-05:00 [04228 verbose 'Replication' opID=FEA28620-00000023] Persisting new group reservation for group protection-group-7024
2011-12-22T22:27:35.349-05:00 [04228 verbose 'Replication' opID=FEA28620-00000023] Persisting folder reservation for group protection-group-7024
2011-12-22T22:27:35.349-05:00 [04228 verbose 'Replication' opID=FEA28620-00000023] Committing the prepare phase database transaction for creation of group protection-group-7024
2011-12-22T22:27:35.380-05:00 [04228 verbose 'Replication' opID=FEA28620-00000023] Completing the prepare phase results in provider for creation of group protection-group-7024
2011-12-22T22:27:35.380-05:00 [04228 verbose 'StorageProvider' opID=FEA28620-00000023] CreateGroupPrepareComplete
2011-12-22T22:27:35.380-05:00 [04228 verbose 'PerformanceMonitor' opID=FEA28620-00000023] Performance monitor Token 0 of lock PersistableRWLock-1009. 'Locked' took 0.094 seconds
2011-12-22T22:27:35.380-05:00 [04228 verbose 'PersistableRWLock' opID=FEA28620-00000023] Releasing Read lock 'PersistableRWLock-1009'
2011-12-22T22:27:35.380-05:00 [04228 verbose 'PersistableRWLock'] Destroying NON persisted released token 0 of lock PersistableRWLock-1009
2011-12-22T22:27:35.380-05:00 [05224 verbose 'PropertyProvider' ctxID=7c7599a1 opID=FEA28620-00000023] RecordOp ASSIGN: vote, cohort-7023
2011-12-22T22:27:35.380-05:00 [05224 verbose 'PropertyProvider' ctxID=7c7599a1 opID=FEA28620-00000023] RecordOp ASSIGN: prepareData, cohort-7023
2011-12-22T22:27:35.380-05:00 [04228 verbose 'PropertyProvider' ctxID=769b8d07 opID=FEA28620-00000023] RecordOp ASSIGN: info.progress, dr.replication.ReplicationManager.createProtectionGroup14
2011-12-22T22:27:35.380-05:00 [05052 warning 'DrTask' opID=FEA28620-00000023] Ignoring SetProgress because task does not exist
Reply
0 Kudos
kpedersen
Contributor
Contributor

Anybody?

I've opened an SR and will post back any findings here if I here back from them in a timely manner, but please jump in with any suggestions in the meantime.

Thanks much,

Karl

Reply
0 Kudos
gorkemg
Contributor
Contributor

I have the same exact situation.

Create Protection group progress bar stalls at %57.

My scenerio is

192.192.1.0 network

192.192.1.10 esxi5

192.192.1.20 vcenter

192.192.1.30 test w2003 server

192.192.2.0 network

192.192.2.10 esxi5

192.192.2.20 vcenter

a router connect 2 networks.There is no policy or acl.Any-Any accepted.

I configured all Vsphere Replication settings and other settings.

replication.jpg

I tried everything and always stalls at %57 .

Any ideas ?

Reply
0 Kudos
kpedersen
Contributor
Contributor

gorkemg,

I'm not sure if our issues are the same, as I'm sure it would depend on the nature of the errors you are seeing.  I'm on Windows 2008 R2, so my SRM log file was located at C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs, and I restarted vCenter and SRM services to start from clear and then tried to create the plan to make life easier.

For general troubleshooting advice, I used this post though it is dated for SRM 4 -- http://www.yellow-bricks.com/2009/08/11/srm-faq/

As far as the resolution to my issue, technical support came to the rescue and quickly identified the problem as a java memory problem because my normal SRM sessions were timing out far too quickly.  The support engineer opened up Programs > VMWare > VMWare Tomcat > Configure Tomcat and then opened up the Java tab to up the maximum memory from 1GB to 2GB and then restarted services.

After that, made all my protection groups and recovery plans without any significant issues.  I am running a fairly small environment, so I loaded vCenter and SRM on the same box which probably explains alot of my woes.  I also cranked the memory up on the vCenter as well.

Hope this helps you out too,

Karl

Reply
0 Kudos
gorkemg
Contributor
Contributor

Karl,

Thank you for your quick response.

As I only said that "Different Scenerios same solution: Smiley Happy "

Thank you very very much.

You saved my day.

I think thats a configuration tweak problem.I have 2 dell r710 servers with 24 gb ram each.They're quite powerful for this test enviroment.

Gorkem.

Reply
0 Kudos