VMware Cloud Community
dcastro
Contributor
Contributor

MSCS guests losing heartbeat connection

Hi All,

Scenario:

- 2 MSCS 2003 guests installed on two ESX 3.5 (build 153875) hosts.

- MSCS guests booting from local storage.

Issue:

Everytime a MSCS guest is rebooted, it lost the heartbeat connection. A new reboot solves the problem and the heartbeat comes back again.

The second guest was a clone from the 1st one.

There are some patches after build 153875 but they don't seem specific to adress this issue.

I appreciate any recommendation/suggestion.

Thanks,

-Diego.

0 Kudos
5 Replies
jbruelasdgo
Virtuoso
Virtuoso

have you added the virtual NICs before adding the virtual disks??

Jose B Ruelas

Jose B Ruelas http://aservir.wordpress.com
0 Kudos
Rumple
Virtuoso
Virtuoso

When you say it loses the hearbeat, do you mean the heartbeat network shows as down in the cluster administrator?

Does esx think that the virtual nic is not connected?

Does windows think the cable is disconnected or does it look like it should be working, but isn't?

Are you able to ping the other heartbeat ip from each of the guests?

Are you using a physical switch between the esx hosts for heartbeat network or a crossover cable? If Physical switch, what kind of switch?

Are you using a seperate ip space from your public network?

If the heartbeat nic looks up, but you can't seem to see the other server over that link I'd suspect ARP issues.

quote - The second guest was a clone from the 1st one.

Did you run sysprep or newsid on that clone prior to using it?

0 Kudos
dcastro
Contributor
Contributor

Jose,

The vNics were added before the disks. Is there any problem working like this?

Thanks

0 Kudos
dcastro
Contributor
Contributor

Rumple,

Under Cluadmin, the Heartbeat connection appears as down and it is not possible to ping the second node heartbeat from the first and vice versa.

Under ESX, the vNic stays connected.

Windows doesn't show NIC down status.

The heartbeat connection is made by crossover cable.

I would not point to ARP resolution once the problem is solved after a reboot, but I will check it.

I used sysprep to clone the guest.

A coworker remember me an important detail: Teaming on the interfaces. I don't have access on the environment yet, but I suspect that there is NIC teaming on the vSwitch for the heartbeat.....

Thanks for the reply.

Diego Castro

0 Kudos
dcastro
Contributor
Contributor

Hi Team,

The Heartbeat issue was solved removing the teaming on the vSwitch for the hearbeat. VMWare nor Microsoft support clustering with teaming on the heartbeat interfaces due to unexpected behavior for the heartbeat.

Thanks everybody.

Diego.

0 Kudos