VMware Cloud Community
r00tkit
Contributor
Contributor

vSphere Server to ESXi3.5 and 4 Disconnects over VPN.

I need some help and I've been picking my brain about this for some time, i found some tips saying it was the SSL cert this that and the other.

OK so in vSphere we have a few 'datacenters'

DMZ

Hospitals

Prod

Dev

DMZ Prob and Dev are all local, DMZ would cross our firewall/VPN Locally.

Now for the hospitals they cross a managed VPN from ATT as we deploy/convert each hosp we are going to have a problem. In the pic i attached you can see ONLY the hospital ESXi server are disconnecting. Once i add them they are OK for about 5mins or less, then nothing. I will have to re-add the server.

If i directly connect to the server i can stay connected for days with out any prob, Keep in mind our VPN tunnels dont disconnect if there is no data, they will stay up. If any chance the DSL/Cable at the hosp go down we have a EVDO card installed in there routers.

Over the wire we have AD/web/monitoring/disk shares/etc etc all OK over the VPN, ALL ports are open to each site, tho the return path is restricted. I believe we have all the open ports.

Any ideas???

Thanks!!

Reply
0 Kudos
12 Replies
AndreTheGiant
Immortal
Immortal

Have you look into ESX and/or VC log files?

Also ping between ESX console and VC is stable?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
r00tkit
Contributor
Contributor

From what i can see there are no errors on both sides. As its only happening to the VPN side of things.

It makes me wonder if the VPN is doing it, logging into the ASA the tunnel is solid for days.

There is a stable connection to the hosp with a avg 53ms on return times, highest would be 70's.

Cacti reports 98% uptime on the router.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

This a new problem in an old solution (was working) or a new installation complete with new problem?

-- David -- VMware Communities Moderator
Reply
0 Kudos
DSTAVERT
Immortal
Immortal

I think I would want better than 98% uptime. If you are paying ATT for managed VPN have them verify the VPN.

-- David -- VMware Communities Moderator
Reply
0 Kudos
r00tkit
Contributor
Contributor

Its been a problem and i thought i had it fiured out when i saw a few postings pointing to a SSL problem but it wasnt it.

The main problem we face with the uptimes is ATT deals with the local ISP if they cant put there own line in, so like this one site its a COX cable line. the 98 is based on installing the air card and testing it for this hospital.

even over our MPLS the same problem occours

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

Start from the simple. Verify that the VPN connects and stays connected. Run something across the connection and verify no dropped packets. What ports do you have open? Is this router to router VPN. Can you set up a client based VPN. ?????

-- David -- VMware Communities Moderator
Reply
0 Kudos
r00tkit
Contributor
Contributor

As of right now. Totally opened all traffic to our DNS and VC server to and from the site.

On the ASA i can see the tunnel has been up for days.

I added the site again.. It adds i see the VM's

Detect VUM GuestAgent

esxivc.

50%

esxivc.

7/15/2009 4:52:42 PM

7/15/2009 4:52:42 PM

When this hits 50%, the server now shows not responding. Keep in mind i can still ping the system.The detect does finish.

I am with the network, so i dont need a client.

As for the ports i followed this doc, http://www.vmware.com/pdf/vi3_301_201_server_config.pdf around page 180.

Reply
0 Kudos
s1xth
VMware Employee
VMware Employee

Just curious..what version of ESX are you running? (I know 3.5, but U3 or U4..latest FW?)...Also what version of Vcenter?

Edit-- my bad...re-read all this closer... so you are connecting to an ESXi 3.5 and 4.0 servers via vSphere vCenter..., I am still going to ask what version of 3.5 is running on the hospital side?

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
Reply
0 Kudos
r00tkit
Contributor
Contributor

At the hosp we are running 3.5 U4 HP.

We tested 4 but with the lack of hardware health we decided not to...

HP DL350 Gen5 and Gen6.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

There is a 4 version for HP but you get it directly from HP. Includes health. Besides the point here.

-- David -- VMware Communities Moderator
Reply
0 Kudos
r00tkit
Contributor
Contributor

I just saw that. I will give that a go on our gen6.

I think there is a NAT problem between the hosp back to corp. I will up the debug level on the vpn.

Reply
0 Kudos
Subatomic
Enthusiast
Enthusiast

Have you checked if you are getting out of state packets?

I'm not sure if this is related. There was an issue with ESX and Virtual Center 2.5 which has been fixed with Virtual Center 2.5 U5. If you are seeing this issue in vCente 4 then I'd probably try setting some dummy guests as decribed in the vmwarewolf link. 2.5U5 was released after vCenter 4 so may have the same inherent issue.

Resolved Issues

This release resolves the following issue:

*A

Firewall Between ESX Server Hosts and VirtualCenter Server Might Drop

an Idle HTTP Connection Between the Hosts and the VirtualCenter Server

and Cause Errors*

This release resolves an issue where if

the HTTP connection between a remote ESX Server host and the

VirtualCenter Server is idle for more than 30 minutes, the firewall

with default session timeout policy drops the idle HTTP connection. The

dropped connection causes the host to be disconnected from the

VirtualCenter Server and reconnected later. When the connection is

dropped, initiating any operations on the remote ESX Server host causes

the VirtualCenter Server to display an error message similar to the

following:

An error occurred while communicating with the remote host.

A new advanced setting entry vpxd.httpClientIdleTimeout

can be used to configure the timeout value for an idle HTTP connection.

The default value for this entry is 15 minutes (900 seconds), ensuring

that the VirtualCenter Server closes the idle HTTP connection after the

connection has been idle for 15 minutes. If a firewall session timeout

value is set to less than 15 minutes, the value for vpxd.httpClientIdleTimeout should be changed to be smaller than the firewall's timeout value.

If the comments were useful, please consider awarding points for helpful or correct.

Thanks

  • SA -

If the comments were useful, please consider awarding points for helpful or correct. Thanks - SA -
Reply
0 Kudos