VMware Cloud Community
bshubinsky
Hot Shot
Hot Shot

Random, frequent vCenter disconnects

Recently we've been plagued with random, frequent disconnects of our hosts from vCenter. It doesn't appear to actually be affecting the VMs themselves (I never lose a ping and never have connection issues) but it does affect VMware View connections. If I have a console window open at the moment the console itself will go black and then come back. These disconnects usually only last a few seconds at most.

You can see from the performance stats how often it happens:

vcenter1.png

You can see from the event log as well a bunch of 'Grey to Green' status messages, along with mks ticket messages as well.

vcenter2.png

Can anyone spare some advice?

0 Kudos
14 Replies
EVW
Enthusiast
Enthusiast

Are there any errors on the Virtual Center server ? Check the Event Viewer for clues about a network card disconnects/reconnects. Have you installed recent NIC drivers?

Is it possible on the switchport to get statistics about the port status, like packet loss? Are there any issues with the UTP cable.

0 Kudos
idle-jam
Immortal
Immortal

is you vcenter a virtual machine or physical machine? if it's a physical machine could you do a extended ping check and see if it does ping time out and we will try to zoom down from there ..

also how is the windows event logs in vcenter?

0 Kudos
gekko
Enthusiast
Enthusiast

Is there a firewall between the vCener server and the service console / mangement network?

The disconnects, do they happen more often during the day or more during the night? ..

-Kenth

0 Kudos
bshubinsky
Hot Shot
Hot Shot

Our vCenter is a virtual machine.

0 Kudos
bshubinsky
Hot Shot
Hot Shot

There is no firewall in between. Judging by the pictures posted above, I'd say it happens at a pretty regular interval - not specific to day or night.

0 Kudos
gekko
Enthusiast
Enthusiast

0 Kudos
bshubinsky
Hot Shot
Hot Shot

That didn't help but I think I may have found the issue. Apparently there was no limit set on the transaction log for the VCDB database and the transaction log has grown to a whopping 36 gigs. I'm guessing it's a timeout issue when writing to the database.

Since I only have about 9 gigs free, anyone have any tips on shrinking the transaction log with a small amount of space?

0 Kudos
gekko
Enthusiast
Enthusiast

Ah okey =).

What recovery model is your database set to? Simple, Full or Bulk?

- Kenth

0 Kudos
bshubinsky
Hot Shot
Hot Shot

Bulk.

0 Kudos
bshubinsky
Hot Shot
Hot Shot

So I ended up doing a full DB backup then a transaction log backup. After the backup I ran a shrink and got the filesize down to something manageable (it's currently at about 20MB).

Still the same problem.

I turned off the firewall on the vCenter VM based on some suggestions I saw.

Still the same problem.

Does anyone have anymore suggestions? I'm at my wits end here.

0 Kudos
bshubinsky
Hot Shot
Hot Shot

Another update.

I sent the logs, twice, to VMware Support but still no answer on why they are doing this. I installed the latest patches on the ESXi servers (in the hope that might fix something that is there) and removed and re-added the hosts using their FQDN.

Still the same issue.

0 Kudos
bshubinsky
Hot Shot
Hot Shot

I hate updated my own posts but I figured keeping a running tally would help anyone else who has this issue in the future. I ran a running ping to both hosts from the vCenter server. All went well enough; a bunch of 1ms, sometimes 2ms pings. I went to lunch and came back and saw the host did the drop thing again. Going through the pings, I saw the one host (the one hosting vCenter), had a single ping of 120ms.

Now, I wouldn't imagine a ping of 120ms would completely drop the connection to vCenter but it seems that is the only latency spike around. Both hosts plug into the same switch and are on the same VLAN.

0 Kudos
ITRC_Architect
Contributor
Contributor

Did you find the problem?

We have very similar issues that VMware can't seem to diagnose.  I would point out that your connects are not really "random" but appear about every 30 minutes after the host reconnect (which is just like our symptoms.)

I have suspicions this is related to the 30 minute performance database work that is performed by vcenter, but I can't pin this down.

If you fixed it, what did you do?

0 Kudos