VMware Cloud Community
racom
Enthusiast
Enthusiast
Jump to solution

"No connection to VR Server: Not responding." after upgrade to vSphere 5.5

In the course of upgrading vSphere 5.1 to 5.5 along to http://kb.vmware.com/kb/2057795 I've updated vSphere Replication to 5.5.0.0 Build 1309877. All replications continued well then. I've only add replication for new vCenter and stopped it for the old one.

After migrating of all VM's including VR appliance to just upgraded other ESXi 5.5 host I've upgraded original host and migrated some VM's back. I've noticed that only this VM's are still replicated then. All other ones had "No connection to VR Server: Not responding." and "RPO violation" messages (including VR). If I migrate VM's to original host replication is restored. If I migrate it to other host connection to VR Server is lost again - regardless on which host is VR running.

Any idea, please?

Tags (2)
1 Solution

Accepted Solutions
mikez2
VMware Employee
VMware Employee
Jump to solution

Yep, that's possible, but I just wanted to rule out any more simple explanations first. I was particularly worried when you said the VR server couldn't ping itself but maybe I misinterpreted what you were saying.

In any case, you can check to see if this is your problem by looking for a line like this in the esx.conf of the hosts that you upgraded:

/net/vmkernelnic/child[0001]/tags/4 = "true"

If so, delete the line from the esx.conf and reboot the host. Then, when the host is back, double check that the nic assigned to the vmkernel has a reachable IP address.

I believe the "child[0001]" part may be different on some setups so if you don't find the exact line I specified above, check around for lines with a different child[] part.

View solution in original post

7 Replies
racom
Enthusiast
Enthusiast
Jump to solution

I forgot to say we are using replication in a Single vCenter Server Instance.

Reply
0 Kudos
mikez2
VMware Employee
VMware Employee
Jump to solution

Ok, I'm somewhat unclear on what exactly the situation is.

Let's call the host you upgraded from 5.1 to 5.5 Host A and the host that was already upgraded to 5.5 Host B.

1. You migrated the VM's and the VR appliance from Host A to Host B.

2. You upgraded Host A from 5.1 to 5.5

3. You migrated some of the VM's (but no the VR server?) from Host B back to Host A.

4. The VM's back on Host A work but the ones on Host B don't work?

5. If you migrate the other VM's back to Host A they work, but migrating the VM's to any host besides host A doesn't work.

Is that right?

Double check that your upgraded setup still allows network traffic to flow from the affected servers to the VR server.

Did you have replication set to go out to a particular vmknic under 5.1?

Reply
0 Kudos
racom
Enthusiast
Enthusiast
Jump to solution

Yes, it's right. But it's related only to replication. If VM's are migrated out of Host A they lose connection to VR server and replication stoped. Even VR server lose connection to itself if it is not running on Host A. VR server was running on Host A during update to 5.5.0.0.

I've tried to configure replication settings for VM's on Host B but nothing changed. If I migrate any previously configured VM to Host A replication is restored without new configuration. And VM's on Host A are replicated even if VR server is migrated to Host B and its replication stopped.

I can ping both replicated and non-replicared VM's  as well as both hosts from VR server.

I'm not sure if I understand well your last question. There are two vmnics on VR server. First one is connected to LAN with VM's. Second one is via VLAN connected to NAS where replicas are stored. We are using replication in a Single vCenter Server Instance.

Reply
0 Kudos
mikez2
VMware Employee
VMware Employee
Jump to solution

racom wrote:

Even VR server lose connection to itself if it is not running on Host A.

Ok, so if the VR server is not running on Host A, then you can't even ping the VR server's address from the VR server itself?

I can ping both replicated and non-replicared VM's  as well as both hosts from VR server.

But you cannot ping the VR server from the hosts or can you?

Do you have some kind of special routing setup that maybe didn't make it's way through the upgrade process?

Reply
0 Kudos
racom
Enthusiast
Enthusiast
Jump to solution

mikez2 wrote:

Ok, so if the VR server is not running on Host A, then you can't even ping the VR server's address from the VR server itself?

But you cannot ping the VR server from the hosts or can you?

Do you have some kind of special routing setup that maybe didn't make it's way through the upgrade process?

All pings are working regardless of where VR server is running. I can ping VR server LAN IP as well as localhost IP from itself. I can ping VR server from both hosts as well as from VM's with active and non active replication.

It doesn't look like network problem by my point of view. I suppose it to be related to PR1121196 I've found here http://kb.vmware.com/kb/2062311. I can see

"2014-01-22T09:25:13.012Z cpu22:1548475)WARNING: Hbr: 549: Connection failed to 192.168.20.19 (groupID=GID-7422a6fc-6a67-469b-97e6-3c1e76315431): Timeout

2014-01-22T09:25:13.012Z cpu22:1548475)WARNING: Hbr: 4521: Failed to establish connection to [192.168.20.19]:44046(groupID=GID-7422a6fc-6a67-469b-97e6-3c1e76315431): Timeout"

messages in /var/log/vmkernel.log on Host B if I migrate VR server there.

But I can't find vSphere Replication traffic checkbox using vSphere Web Client (could it be feature of vSphere 5.1 only?). Corresponding patches are for ESXi 5.1 and both hosts was updated to ESXi 5.5 yet. I can see this patches in vSphere Update Manager but hosts look like compliant after scan.

Reply
0 Kudos
mikez2
VMware Employee
VMware Employee
Jump to solution

Yep, that's possible, but I just wanted to rule out any more simple explanations first. I was particularly worried when you said the VR server couldn't ping itself but maybe I misinterpreted what you were saying.

In any case, you can check to see if this is your problem by looking for a line like this in the esx.conf of the hosts that you upgraded:

/net/vmkernelnic/child[0001]/tags/4 = "true"

If so, delete the line from the esx.conf and reboot the host. Then, when the host is back, double check that the nic assigned to the vmkernel has a reachable IP address.

I believe the "child[0001]" part may be different on some setups so if you don't find the exact line I specified above, check around for lines with a different child[] part.

racom
Enthusiast
Enthusiast
Jump to solution

It works! After removing the line from the esx.conf and rebooting Host B replication of all VM's on both hosts continue. Many thanks.