VMware Cloud Community
technologist123
Contributor
Contributor

SRM Error - lost connection

Hi Guys

I have just configured SRM 4 on both sites Prod and DR. Both sites are ESX 4.1 & VC Server 4.1

I have noticed that every now and then I am getting error messages as shown in the attached document. please advise what went wrong .

VC lost connection to local SRM , Also in addition to that a seperate issue is both site stop communicating to each other.

pls see attached doc. I have already finished KB Article: 1029057 regarding tomcat memory btu no joy.

thanks

0 Kudos
23 Replies
ar1es
Enthusiast
Enthusiast

Hey Techno,

What are the localSiteStatus.checkInterval and localSiteStatus.eventfrequency settings for localSiteStatus?
Also, what are the settings for remoteSiteStatus.checkInterval for remoteSiteStatus?

Thanks,

-rp

0 Kudos
mal_michael
Commander
Commander

Hi,

Do you have a firewall between the two sites?

Michael.

0 Kudos
technologist123
Contributor
Contributor

Hello Mate , thanks for ur reply

At Production :

localSiteStatus  Settings :

localsitestatus.checkinterval = 60 secs

localsitestatus.eventfrequency = 600 secs

remoteSiteStatus Settings :

remoteSiteStatus.checkinterval = 60 secs

remoteSiteStatus.panicDelay = 5

At DR Site :

localSiteStatus  Settings :

localsitestatus.checkinterval =  120 secs

localsitestatus.eventfrequency = 600 secs

remoteSiteStatus Settings :

remoteSiteStatus.checkinterval = 120 secs

remoteSiteStatus.panicDelay = 5

0 Kudos
technologist123
Contributor
Contributor

Michael M. wrote:

Hi,

Do you have a firewall between the two sites?

Michael.

NO FIREWALL between sites. both on LAN

0 Kudos
jhrasko
Contributor
Contributor

hello,

we have got same issue appearing with srm plugin losing connection, is there some workaround/fix already existing ? So far i havent found any confirmed working solution.

thanks

0 Kudos
technologist123
Contributor
Contributor

jhrasko wrote:

hello,

we have got same issue appearing with srm plugin losing connection, is there some workaround/fix already existing ? So far i havent found any confirmed working solution.

thanks

I have opened up the case with vmware but their response particularly in this case is hopeless, normally they are good in fixing problems.

0 Kudos
AureusStone
Expert
Expert

Can you please clarify "both sites stop communicating".  Do you mean SRM, vCenter or WAN?

Edit: Generate a log bundle.  Go the SRM folder in the start menu and click on "Generate Site Recovery Manager Log Bundle".

0 Kudos
technologist123
Contributor
Contributor

AureusStone wrote:

Can you please clarify "both sites stop communicating".  Do you mean SRM, vCenter or WAN?

SRM 

0 Kudos
AureusStone
Expert
Expert

Okay yeah.  Then generate some logs.  Also look at event viewer on your SRM server for any strange events.

It could possibly be a DB issue.  What DB are you using.  Local or remote?  Any other applications using this DB?  If SQL are you using SQL authenication?  Have you configured your SQL Server to on accept connections on one port, or a range?

0 Kudos
technologist123
Contributor
Contributor

AureusStone wrote:

Okay yeah.  Then generate some logs.  Also look at event viewer on your SRM server for any strange events.

It could possibly be a DB issue.  What DB are you using.  Local or remote?  Any other applications using this DB?  If SQL are you using SQL authenication?  Have you configured your SQL Server to on accept connections on one port, or a range?

Its bundled SQL 2005 express "Local" on both sites. No other apps using thsi Db, purely for SRM .

windows authentication.

i have checked sql configuration manager and managment studio both got only 1 option to set 1 port for communication , there is no range thing.

no error in event viewer and VC.

Vmware took webex session and got all the log and 2 weeks no reply.

0 Kudos
mal_michael
Commander
Commander

Why did you change Advanced Settings on DR site's SRM?

localsitestatus.checkinterval = 120

remotesitestatus.checkinterval = 120

The default is 60. Set the settings back to 60 and see if the issue persists. Probably, restart of SRM service required for changes to catch.

0 Kudos
serged
Contributor
Contributor

I've had the default settings set and I also keep seeing both sides disconnect constantly.  Anyone have any other suggestions?

0 Kudos
ar1es
Enthusiast
Enthusiast

Hey Techno,

If possible, let's increase the all the underlined items and see if it clears the error.

localSiteStatus  Settings :

old - localsitestatus.checkinterval = 60 secs

localsitestatus.eventfrequency = 600 secs

new - localSiteStatus = 300

remoteSiteStatus Settings :

remoteSiteStatus.checkinterval = 60 secs

remoteSiteStatus.panicDelay = 5

At DR Site :

localSiteStatus  Settings :

old - localsitestatus.checkinterval =  120 secs

new - localsitestatus.checkinterval = 60 secs

localsitestatus.eventfrequency = 600 secs

remoteSiteStatus Settings :

old - remoteSiteStatus.checkinterval = 120 secs

new - remoteSiteStatus.checkinterval = 60

remoteSiteStatus.panicDelay = 5

-rp

0 Kudos
technologist123
Contributor
Contributor

ar1es wrote:

Hey Techno,

If possible, let's increase the all the underlined items and see if it clears the error.

localSiteStatus  Settings :

old - localsitestatus.checkinterval = 60 secs

localsitestatus.eventfrequency = 600 secs

new - localSiteStatus = 300

remoteSiteStatus Settings :

remoteSiteStatus.checkinterval = 60 secs

remoteSiteStatus.panicDelay = 5

At DR Site :

localSiteStatus  Settings :

old - localsitestatus.checkinterval =  120 secs

new - localsitestatus.checkinterval = 60 secs

localsitestatus.eventfrequency = 600 secs

remoteSiteStatus Settings :

old - remoteSiteStatus.checkinterval = 120 secs

new - remoteSiteStatus.checkinterval = 60

remoteSiteStatus.panicDelay = 5

-rp

thanks for ur reply

I did that  and restaerted the SRM service, let see how it goes.

0 Kudos
technologist123
Contributor
Contributor

Adeel wrote:

ar1es wrote:

Hey Techno,

If possible, let's increase the all the underlined items and see if it clears the error.

localSiteStatus  Settings :

old - localsitestatus.checkinterval = 60 secs

localsitestatus.eventfrequency = 600 secs

new - localSiteStatus = 300

remoteSiteStatus Settings :

remoteSiteStatus.checkinterval = 60 secs

remoteSiteStatus.panicDelay = 5

At DR Site :

localSiteStatus  Settings :

old - localsitestatus.checkinterval =  120 secs

new - localsitestatus.checkinterval = 60 secs

localsitestatus.eventfrequency = 600 secs

remoteSiteStatus Settings :

old - remoteSiteStatus.checkinterval = 120 secs

new - remoteSiteStatus.checkinterval = 60

remoteSiteStatus.panicDelay = 5

-rp

thanks for ur reply

I did that  and restaerted the SRM service, let see how it goes.

NO Joy , the problem still persist Smiley Sad

0 Kudos
serged
Contributor
Contributor

Same here. I made the change earlier today and looked promising but its still losing connections. Any other suggestions?

Sent from my iPhone

0 Kudos
ar1es
Enthusiast
Enthusiast

Hi,

Let's start tailing or monitoring the SRM logs for the last event right before the disconnect. This should help us narrow the view the problem.

Log location:

C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs

-rp

0 Kudos
serged
Contributor
Contributor

Here is the latest log - This morning when connected to the VC, I had to  reconnect to SRM

thanks

0 Kudos
ar1es
Enthusiast
Enthusiast

Hello,

Checked your log and noticed you're getting SSL related timeouts.

Let's try the following:

1. Try connecting to the vcenter web service for both vcenter servers.

     https://hostname

2. Try installing the SRM plug-in on a separate management workstation.

3. Issue a ping capture from site A to site B and redirect it to a file and look for timeouts.

     c:\PING > C:\pingtest.txt

0 Kudos