Dear all
I'm running a VSphere 4.0 cluster with two IBM 3550M2 machines configured to use HA.
System configuration is as follows: 2 vmnic on subnetA (console) and 2 vmnic on subnetB
I noticed the following behaviour to cause the system to crash and shutdown:
1. start a vm on subnetB
2. Unplug one vmnic on subnetB
3. ESX crash
4. vm starts on the other host (expected behaviour).
I first suspected hardware problem but toghether with IBM I updated all possible firmware (IMM, UEFI, servRaid, Broadcom).
Anyone experienced this? Thank you for any comments.
Gabriel
Do you see anything in the logs on the host? Also, by crash, do you mean it reboots or do you experience a PSOD?
If you found this at all helpful please award points by using the correct or helpful buttons! Thanks!
Hi Jamesbowling,
the server shutsdown. I have to start it back by turning power back on.
Regards,
Gabriel
Does this happen when you remove a cable from SubnetA's group of NICs?
If you found this at all helpful please award points by using the correct or helpful buttons! Thanks!
One important detail missing is the fact that crash happens only if a VM is running on that subnet.
If I have no running VM then unplugging network cable does not cause a crash.
Have you looked at the logs on the host to see if you see anything regarding any errors? That would be the first place to look.
If you found this at all helpful please award points by using the correct or helpful buttons! Thanks!
On VMware I have the following events on VCenter:
Host is not responding
error
04-11-2010 11:11:42
VMNIX: <0>Dazed and confused, but trying to
continue (0:00:06:53.025 cpu0:4096)
warning
04-11-2010 11:09:14
VMNIX: <0>Do you have a strange power saving
mode enabled? (0:00:06:53.025 cpu0:4096)
warning
04-11-2010 11:09:14
VMNIX: <0>Uhhuh. NMI received for unknown
reason 3d. (0:00:06:53.024 cpu0:4096)
warning
04-11-2010 11:09:14
APIC: 1385: Lint1 interrupt on pcpu 0
(0:00:06:53.024 cpu0:4096)
warning
04-11-2010 11:09:14
Just for all around information:
- Are you using the defaults for NIC Teaming on your PortGroup for SubnetB?
- Are you able to view the host logs directly instead of through vCenter?
If you found this at all helpful please award points by using the correct or helpful buttons! Thanks!
We would need to look further into the host logs such as /var/log/vmware/hostd.log. We may be able to see something else happening. Does this happen on either NIC being removed from SubnetB.
If you found this at all helpful please award points by using the correct or helpful buttons! Thanks!
Hello All
I'm sorry I didn't reply earlier and thank you for all your help. It was an hardware problem after all.
IBM replaced both motherboard and NIC board at the same time and the problem never happened again.
Thank you for all the help once more.
Gabriel
Glad to hear that, how do you know it's a hardware problem in the first place? did you run any hardware diagnostic check?
Hello idle
Toghether with IBM we run a series of tests and upgraded firmware to latest levels. Although IBM hardware tests always reported everything in good status a crash happened while repeating the tests (ESX was down at the time). After this event IBM Labs decided to replace both hardware parts that had been replaced previously at diferent times.
Happy New Year.
Gabriel