VMware Cloud Community
kruddy
Enthusiast
Enthusiast

HmsGroup.OnlineSync Fail - unknown error occurred

Has anyone else seen this error with VRMS?

I have two sites setup in SRM5.

Each site has a VRM server deployed, configured, and connected.

Each site has a single vSphere Replication server deployed and registered as well.

Summary tab shows them as connected via IP (did not use DNS due to other problems I had read), one site shows 2 source VMs, datastore mappings are correct, permissions are correct (run as admin).

So when I try and tell it to "synchronize" I receive a "HmsGroup.OnlineSync" for object "GID...." on Server "xxx.xxx.xxx.xxx" failed. An unknown error has occurred."

Anyone else run into this or could give me some pointers on how to troubleshoot from here?

Thanks.

Please, don't forget the awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
15 Replies
admin
Immortal
Immortal

I would start with looking at the primary side VRMS logs, perhaps you can attach the log file that has the error.

Thanks

Sudarsan

Reply
0 Kudos
kruddy
Enthusiast
Enthusiast

I'm looking in the /opt/vmware/logs/hms folder for the hms.log file, however when I hit the "Synchronize Now" button it does not appear to write to that log.

Any other logs I should be looking for?

Please, don't forget the awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
kruddy
Enthusiast
Enthusiast

Doing some further testing, it looks like I can replicate from Site 2 to Site 1 just fine. However the Site 1 to Site 2 is not working.

I'm going to guess firewall problems.

One other item I noticed, the icons are different. I would not have thought that would happen, and I know icons can be modified, but that has not been the case in this situation. Both indicate that they are Managed by VR Management.

The exact same ISOs and EXEs were used for both sites, ESXi, vCenter, SRM, even SQL.

Please, don't forget the awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
rlquilez
Contributor
Contributor

Hi Kruddy,

Im having the same problem, did you found any solution ?

Thanks

Reply
0 Kudos
kruddy
Enthusiast
Enthusiast

Sorry for the delay.

As best as I can tell, the error was related to a firewall configured improperly.

However, I am having further failures now...

Basically, it will do the initial sync... then never sync again and giving me the same "HmsGroup.OnlineSync" error.

Time to dig through the logs again.

Please, don't forget the awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
kruddy
Enthusiast
Enthusiast

Events read that the VM is in an "Invalid State"

Which being that it's powered on, VMware Tools running, and functioning properly... I'm not sure what it could be referring to.

Please, don't forget the awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
kruddy
Enthusiast
Enthusiast

Found these in the logs:

2012-01-04T18:58:34.533Z [F5993B70 error 'Main' opID=hsl-a0edc478]    [0] ExcError: exception N3Vim5Fault12FileTooLarge9ExceptionE: vim.fault.FileTooLarge
2012-01-04T18:58:34.533Z [F5993B70 error 'Main' opID=hsl-a0edc478]    [1] Code set to: Generic storage error.

Did I miss a size limitation?

I'm not seeing any volumes on the SAN filling up on either side, so I don't think it's a problem on the SAN.

Guess I'll create a case and see what they have to say!

Please, don't forget the awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
RicF
Contributor
Contributor

Hi kruddy

was this resolved? I am facing a similar issue with just a single VM, I have a mix of converted VM's and indigineous VM's. I can start replication on both types of VM's on both sites except for one which is erroring out with the same error you have described.

I've removed and re-added to inventory, I've moved vmdk files around as they were on different data stores but no luck.

Reply
0 Kudos
anupampushkar
Contributor
Contributor

Same problem any solution

Reply
0 Kudos
vilinski
VMware Employee
VMware Employee

,

what reason do you see for the error?

Reply
0 Kudos
Windspirit
Hot Shot
Hot Shot

Hi all,

I just had a client where the ESXi hosts are in a different subnet then the vRA and the vCenter. This resulted in the same error: Hmsgroup.onlineSync

After opening up TCP 31031 and 44046 between the ESXi servers on BOTH sides we were able to get it working.

So from what it looks like is that the vRA AGENTS on the ESXI servers are syncing with each other NOT the vRA itself.

The problem with the whole think is that the documentation on all this is very light. The KB 1009562 suggest that yo have to open these ports between the ESXI and the vRA appliance on the REMOTE site, however this dosnt seem true as we just tested this out.

If anyone has an indeep understanding how the vRA rep works in detail (ports etc) please let me know.

regards

D

Reply
0 Kudos
anupampushkar
Contributor
Contributor

Issue was resolved for me when we upgraded the firewall. We used Wireshark to monitor the connection and found firewall is having issue.

Reply
0 Kudos
AnthBro
Enthusiast
Enthusiast

So you are on the right track.

The short answer is the replication does both host to host and vra to host traffic when replicating.

When the first and only on the first full synchronisation the ESXi host with the source machine will replicate to the ESXi host with the destination machine directly.

Once the initial synchronisation complete incremental changes/sync I'm less clear but they originate on the vRA not sure if the you go target vRA or straight to target host, but I think vRA to vRA.

Hence why there are ports required for host to host, vRA to opposite side host and vRA to vRA.

Any views or opinions presented in this post are solely those of the author and do not necessarily represent those of the company he works for.
Reply
0 Kudos
mvalkanov
VMware Employee
VMware Employee

AnthBro wrote:

The short answer is the replication does both host to host and vra to host traffic when replicating.

When the first and only on the first full synchronisation the ESXi host with the source machine will replicate to the ESXi host with the destination machine directly.

Once the initial synchronisation complete incremental changes/sync I'm less clear but they originate on the vRA not sure if the you go target vRA or straight to target host, but I think vRA to vRA.

Hence why there are ports required for host to host, vRA to opposite side host and vRA to vRA.

Hi,

This is not correct. VR does not directly talk ESXi to ESXi.

ESXi talks LWD protocol to VR server on ports 31031 and 44046.

VR server talks NFC protocol (port 902) to target ESXi to access the target datastores. VR server uses port 80 of target ESXi to list the accessible target datastores.

If you think that some port is missing in the KB article - please provide exact numbers and corresponding parties - ESXi, VR appliance, etc. - so that the KB article can be fixed if necessary.

Regards,

Martin

Reply
0 Kudos
yassinos
Contributor
Contributor

Hi all,

I had the some error and after i verified that the ESXI can't reach the VR subnet, after fixing network issue the replication inicial sync properly.

Thank you

Reply
0 Kudos