VMware Cloud Community
JustinD
Contributor
Contributor

Curious Problem with VM's losing network connectivity...

Hi,

Pretty new to VMware so i am really after some guidance as to where to start.

I have several ESX 3.0.1 servers. Setup on HP blades with SAN disks. Virtual servers are 2003 R2 Servers.

My problem is with a single ESX server that works with 2 others to provide our VM environment. Several days ago, all the VM's (2 of) on this ESX server lost there network connections. They can ping themselves but no other network devices. No errors in the event logs and the status shows connected for the network cards.

The network cards show as "flexible" on Vcenter and the physical cards are NC7782 (2) and NetXtreme BCM5703 (2).

Rebooting the ESX server fixes the problem temporarily...

Since the problem occurred, i moved the two VM's to another ESX server where they are fine. I have then moved another (not used VM) to the problem ESX server. So far the problem has not reoccurred.

Rebooting the server does not raise any errors during POST.

So i want to know where to look in ESX! before i move any prod VM's back...

Something to note... not sure if this is related but the network config of these servers have only one NIC assigned to VM's. Other other ESX environment (essentially same hardware but supports different domain) has two nics assigned to the VMs. So I have allocated the free NIC to the VMs...

Any guidenance much appreciated!

Justin

0 Kudos
12 Replies
wobbly1
Expert
Expert

Obvious place to start is the firmware and bios of the server compared to the others that are working correctly. Check the IML log for errors as well as /var/logs

Worse case and if you have the option I would suggest reinstalling ESX onto the problem box just in case there is a problem with the install

JustinD
Contributor
Contributor

Hiya,

Thanks for that.... checked the logs (not which is which so checked all) but nothing jumps out.

The 3 servers were built at the same time and are identical in firmware and bios...

The rebuild is going to be the way to go but will try it with a couple of test VM's for a bit to see if the error comes back.

Thanks

Justin

0 Kudos
wally
Enthusiast
Enthusiast

Justin, I don't know the physical cards you mention but are there any of them using the intel E1000 driver/module ? Your description exactly matches the "Tx unit hang" problem with those nics. If sou you could do a grep "Tx" on your /var/log/vmkernel logs.

piacas
Enthusiast
Enthusiast

Are the ESX servers plugged into a Cisco switch? If port security is enabled on the port the ESX server is pluggedd into, the Ciso switch will 'see' multiple MAC addresses coming from same port and start blocking it.....disable port security on Cisco switch for your ESX servers.

Dave

0 Kudos
JustinD
Contributor
Contributor

Hi Wally,

Did not find anything in the logs about a unit hang but there are entries for all the NICs but no errors. At least no obivous errors - do some ESX training in May which is way overdue.

One thing... three of the NICs are currently running at 100Mb as we have to do some fiddling shortly to put in 1Gb switches...

0 Kudos
JustinD
Contributor
Contributor

Hi Dave,

We use 3Com switches and a Cisco router...

The odd thing is that the other ESX servers have not exhibited the problem and it has not re-occured on the first server.

0 Kudos
zenariga
Enthusiast
Enthusiast

Hi guys,

I have the same ambient and the same problem. What is the correct answer for this topic????

Thanks

0 Kudos
zenariga
Enthusiast
Enthusiast

I ask because i look the Cisco switch and dont have any policy configuration.

0 Kudos
JustinD
Contributor
Contributor

Hi,

Unfortunetly (?), the problem has not re-occured and i am not able to reproduce it.

I think it maybe related to not yet having the whole environment running at 1Gb network speeds. this is to be fixed shortly (can't use VMotion without it) and am also attending the VMWare course next month so this may shed some light on the problem.

Or it could be that I had forgotten to make two NIC's available to VMs. But then, i have not done this for the other two ESX servers and they are fine.

Regards

Justin

0 Kudos
zenariga
Enthusiast
Enthusiast

Thanks, I continue to find a solution for this problem.

Thanks Again

0 Kudos
soren
Contributor
Contributor

Same problem - all VMs on an ESX server suddently lose network connectivity. No outgoing, neither incoming network access. Guest IP configuration on Windows is fine.

Solved the problem (this time) by:

\- disconnecting the network and re-connecting it (from VI client: edit settings, network adapter 1, unselect "connected", ok; then, edit settings, network adapter 1, select "connected",ok)

\- or rebooting the vm

It happened at the same time as powering on a new vm on the same port group. This server is installed with ESX 3.0.0. It happened on another 3.0.0 6 months ago.

Soren

0 Kudos
JustinD
Contributor
Contributor

CLosing as the problem has not reoccurred.

0 Kudos