VMware Cloud Community
nvlop
Contributor
Contributor

ESX Host crash with purple screen while broadcast storm

Hi everybody,

We had a really strange issue last night on our esx infrastructure :

We noticed that some of our VM went down and being not accessible so we start looking at ESX host and saw that one of them was down.

After long hours of diagnostic, we saw that one port of our main switch was doing broadcast storm so we unpluged it and restart our ESX and everything went fine again.

On our vCenter, we had lot of alerts saying that some VM's were unreachable and also CPU alert temperature on host.

On the ESX console, we saw that there were a purple screen (no screenshot badly) but it was talking about CPU Locked up !

My question is:  why a broadcast storm situation is crashing our ESXs ?

We have 3 ESXi (5.1.0 build 799733) on HP Proliant DL360 G7 and a HP MSA Storage P2000.

Does anyone have any ideas or used to have same situation ?

Thanks a lot

Best

Nic.

0 Kudos
4 Replies
nvlop
Contributor
Contributor

Anyone getting same error ?

To give more informations, we use Broadcom NC382i Multi Port PXI Express with BNX2 driver on vSwitch.

Thanks for your help,

Best

0 Kudos
Jimmy15
Enthusiast
Enthusiast

If NIC fails of malfunction..It leads to ESXi unresponsive..I gone thru such instance in my production environment last week.

It won't let ssh or login thru DCUI. I shutdown all the VMs (running in memory) gracefully and reboot the host. surprisingly the NIC showing Red cross sign over it in VC disappear.

CPU error on Purple screen  is nothing to do with this.


regards



PS: Mark kudos or correct answer as appropriate 🙂
0 Kudos
nvlop
Contributor
Contributor

Hello Jimmy,

Thanks for your reply.


Yes it is totally true.

Our ESX was unacessible until we fixed the port on switch doing broadcast storm and rebooted our ESX, everything went fine.

My question is, how can we prevent this type of crash ? Is there anything to do in ESX configurations or switch configuration to prevent this type of crash ?

And how can you explain that our fan's ESX were working hard (lot of noise and fan at 100%) and our power supply had no green LED (like if there were no electricity in it) behind our ESX ?

It was so weird that we were complety lost on this crash... We first thought at an electric trouble and finally find that it was a local network loop doing broadcast storm on the building which was crashing our hosts...

Any advice is welcome or explanation on this crash

Best

0 Kudos
nvlop
Contributor
Contributor

Up guys please, we really need to understand this strange behavior.

Best regards

0 Kudos