Hi Guys
I have just configured SRM 4 on both sites Prod and DR. Both sites are ESX 4.1 & VC Server 4.1
I have noticed that every now and then I am getting error messages as shown in the attached document. please advise what went wrong .
VC lost connection to local SRM , Also in addition to that a seperate issue is both site stop communicating to each other.
pls see attached doc. I have already finished KB Article: 1029057 regarding tomcat memory btu no joy.
thanks
Hey Techno,
What are the localSiteStatus.checkInterval and localSiteStatus.eventfrequency settings for localSiteStatus?
Also, what are the settings for remoteSiteStatus.checkInterval for remoteSiteStatus?
Thanks,
-rp
Hi,
Do you have a firewall between the two sites?
Michael.
Hello Mate , thanks for ur reply
At Production :
localSiteStatus Settings :
localsitestatus.checkinterval = 60 secs
localsitestatus.eventfrequency = 600 secs
remoteSiteStatus Settings :
remoteSiteStatus.checkinterval = 60 secs
remoteSiteStatus.panicDelay = 5
At DR Site :
localSiteStatus Settings :
localsitestatus.checkinterval = 120 secs
localsitestatus.eventfrequency = 600 secs
remoteSiteStatus Settings :
remoteSiteStatus.checkinterval = 120 secs
remoteSiteStatus.panicDelay = 5
Michael M. wrote:
Hi,
Do you have a firewall between the two sites?
Michael.
NO FIREWALL between sites. both on LAN
hello,
we have got same issue appearing with srm plugin losing connection, is there some workaround/fix already existing ? So far i havent found any confirmed working solution.
thanks
jhrasko wrote:
hello,
we have got same issue appearing with srm plugin losing connection, is there some workaround/fix already existing ? So far i havent found any confirmed working solution.
thanks
I have opened up the case with vmware but their response particularly in this case is hopeless, normally they are good in fixing problems.
Can you please clarify "both sites stop communicating". Do you mean SRM, vCenter or WAN?
Edit: Generate a log bundle. Go the SRM folder in the start menu and click on "Generate Site Recovery Manager Log Bundle".
AureusStone wrote:
Can you please clarify "both sites stop communicating". Do you mean SRM, vCenter or WAN?
SRM
Okay yeah. Then generate some logs. Also look at event viewer on your SRM server for any strange events.
It could possibly be a DB issue. What DB are you using. Local or remote? Any other applications using this DB? If SQL are you using SQL authenication? Have you configured your SQL Server to on accept connections on one port, or a range?
AureusStone wrote:
Okay yeah. Then generate some logs. Also look at event viewer on your SRM server for any strange events.
It could possibly be a DB issue. What DB are you using. Local or remote? Any other applications using this DB? If SQL are you using SQL authenication? Have you configured your SQL Server to on accept connections on one port, or a range?
Its bundled SQL 2005 express "Local" on both sites. No other apps using thsi Db, purely for SRM .
windows authentication.
i have checked sql configuration manager and managment studio both got only 1 option to set 1 port for communication , there is no range thing.
no error in event viewer and VC.
Vmware took webex session and got all the log and 2 weeks no reply.
Why did you change Advanced Settings on DR site's SRM?
localsitestatus.checkinterval = 120
remotesitestatus.checkinterval = 120
The default is 60. Set the settings back to 60 and see if the issue persists. Probably, restart of SRM service required for changes to catch.
I've had the default settings set and I also keep seeing both sides disconnect constantly. Anyone have any other suggestions?
Hey Techno,
If possible, let's increase the all the underlined items and see if it clears the error.
localSiteStatus Settings :
old - localsitestatus.checkinterval = 60 secs
localsitestatus.eventfrequency = 600 secs
new - localSiteStatus = 300
remoteSiteStatus Settings :
remoteSiteStatus.checkinterval = 60 secs
remoteSiteStatus.panicDelay = 5
At DR Site :
localSiteStatus Settings :
old - localsitestatus.checkinterval = 120 secs
new - localsitestatus.checkinterval = 60 secs
localsitestatus.eventfrequency = 600 secs
remoteSiteStatus Settings :
old - remoteSiteStatus.checkinterval = 120 secs
new - remoteSiteStatus.checkinterval = 60
remoteSiteStatus.panicDelay = 5
-rp
ar1es wrote:
Hey Techno,
If possible, let's increase the all the underlined items and see if it clears the error.
localSiteStatus Settings :
old - localsitestatus.checkinterval = 60 secs
localsitestatus.eventfrequency = 600 secs
new - localSiteStatus = 300
remoteSiteStatus Settings :
remoteSiteStatus.checkinterval = 60 secs
remoteSiteStatus.panicDelay = 5
At DR Site :
localSiteStatus Settings :
old - localsitestatus.checkinterval = 120 secs
new - localsitestatus.checkinterval = 60 secs
localsitestatus.eventfrequency = 600 secs
remoteSiteStatus Settings :
old - remoteSiteStatus.checkinterval = 120 secs
new - remoteSiteStatus.checkinterval = 60
remoteSiteStatus.panicDelay = 5
-rp
thanks for ur reply
I did that and restaerted the SRM service, let see how it goes.
Adeel wrote:
ar1es wrote:
Hey Techno,
If possible, let's increase the all the underlined items and see if it clears the error.
localSiteStatus Settings :
old - localsitestatus.checkinterval = 60 secs
localsitestatus.eventfrequency = 600 secs
new - localSiteStatus = 300
remoteSiteStatus Settings :
remoteSiteStatus.checkinterval = 60 secs
remoteSiteStatus.panicDelay = 5
At DR Site :
localSiteStatus Settings :
old - localsitestatus.checkinterval = 120 secs
new - localsitestatus.checkinterval = 60 secs
localsitestatus.eventfrequency = 600 secs
remoteSiteStatus Settings :
old - remoteSiteStatus.checkinterval = 120 secs
new - remoteSiteStatus.checkinterval = 60
remoteSiteStatus.panicDelay = 5
-rp
thanks for ur reply
I did that and restaerted the SRM service, let see how it goes.
NO Joy , the problem still persist
Same here. I made the change earlier today and looked promising but its still losing connections. Any other suggestions?
Sent from my iPhone
Hi,
Let's start tailing or monitoring the SRM logs for the last event right before the disconnect. This should help us narrow the view the problem.
Log location:
C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs
-rp
Hello,
Checked your log and noticed you're getting SSL related timeouts.
Let's try the following:
1. Try connecting to the vcenter web service for both vcenter servers.
2. Try installing the SRM plug-in on a separate management workstation.
3. Issue a ping capture from site A to site B and redirect it to a file and look for timeouts.
c:\PING > C:\pingtest.txt