VMware Cloud Community
dustynz
Contributor
Contributor

Service console drops off network. VM's okay.

I have just upgraded from 3.0.2 to 3.5U4 last Friday.

Upgrade went smooth however when I checked the system on Monday vCenter was unable to contact the host. The VM's that were running on it were fine and continued to run.

Previously this machine has had an unusual problem were the vMotion IP would do the same thing. Gave it a new IP on Friday as part of the Upgrade.

Had a quick look through the logs for the IP and vswif0 but no luck.

How should I proceed?

It is part of a IBM blade center and the blade is an HS20.

There are 8 ESX blades in the Cluster.

0 Kudos
8 Replies
Lightbulb
Virtuoso
Virtuoso

So is it still disconnected from Virtual Center or did a hostd restart resolve issue?

Have you verified /etc/host and DNS entries both host and Vcenter?

I assume you could access the system directly via SSH or the VIC client, is this so?

0 Kudos
dustynz
Contributor
Contributor

I was unable to ping the Service console or ssh to it either.

I remoted to the VM's on the host and shut them down.

Then at the server I issued a reboot.

It came up fine after the restart. Monitoring to see how it goes.

DNS was fine. Both are able to resolve each other via nslookups and pings.

It almost looks like an issue with the virtual Switch as the virtual machines are sitting on the same vSwitch0 and they were unimpacted.

Next time I will have a look at routing tables and ARP tables?

0 Kudos
Lightbulb
Virtuoso
Virtuoso

If the issue persists you might want to think about evacuating the VMs to another node in the cluster and then doing a fresh build. Or you could vmotion the VMs off to another node and rebuild ESX networking from the console. Depends how much spare time you have. If you do a fresh build and the issue crops up again on the same host you will know that your issue is more than a misconfiguration of bug.

0 Kudos
dustynz
Contributor
Contributor

Rebuilt the blade server yesterday and chucked some monitoring on it.

at 8:18 this morning it dropped of the network. This is a complete new build. Formatted HDD. Not an upgrade. I did re-cycle the network IP's and matched the Settings from the other blades in the environment.

Here are the Logs from the vmkernel log file.

Apr 29 07:45:36 ccesx02 vmkernel: 0:14:44:40.524

cpu2:1026)LinNet: 2017: invalid vlan tag: 4095 dropped

Apr 29 08:03:36 ccesx02 vmkernel: 0:15:02:40.785

cpu2:1026)LinNet: 2017: invalid vlan tag: 4095 dropped

Apr 29 08:18:11 ccesx02 vmkernel: 0:15:17:15.836

cpu3:1027)<6>tg3: vmnic0: rx_mode 0x2(off)=>0x102 tg3_flags 0x82467ca5 tg3_flags2 0x40a804

Apr 29 08:18:16 ccesx02 vmkernel: 0:15:17:20.846

cpu3:1027)<6>tg3: vmnic0: rx_mode 0x2(off)=>0x102 tg3_flags 0x82467ca5 tg3_flags2 0x40a804

Apr 29 08:18:17 ccesx02 vmkernel: 0:15:17:21.847

cpu3:1027)<6>tg3: vmnic0: rx_mode 0x2(off)=>0x102 tg3_flags 0x82467ca5 tg3_flags2 0x40a804

Apr 29 08:19:50 ccesx02 vmkernel: 0:15:18:55.030

cpu3:1027)<6>tg3: vmnic0: rx_mode 0x2(off)=>0x102 tg3_flags 0x82467ca5 tg3_flags2 0x40a804

I have since given the Host new IP's. Just in case another device is trying to share that network.

Ideas?

0 Kudos
JDLangdon
Expert
Expert

Does your host crash or does it simply drop off the network? I have a similar issue with an IBM xSeries 366 where the host crashes but the VM's remain running. What we are experiencing is a known issue with IBM hardware running VMware and is related to a firmware mismatch. If you haven't yet done so, recommend you contact IBM and have the examine your DSA reports.

________________________________

Jason D. Langdon

0 Kudos
dustynz
Contributor
Contributor

The ESX host does not crash.

I can still hop on the console. Issue commands etc. Just can't ping or connect anything. Eg like the Service console has dropped off the virtual switch.

I can still connect to the VM's over the network. I RD to them and power them off. I then hop on the console the restart the ESX host at the command prompt.

0 Kudos
JDLangdon
Expert
Expert

That's not the same problem I am experiencing. How do you have vSwitch0 configured? Is it an uplink port or a VLAN trunk?

________________________________

Jason D. Langdon

0 Kudos
dustynz
Contributor
Contributor

Single vSwitch with Service console and vmKernel on the same vswitch.

With ESX 3.0.2 we kept suffering a problem where the vmKernel IP kept dropping off. Just the single host in a cluster of 8.

After upgrading to 3.5U4 we now suffer a problem where the Service console drops off the network.

Vlan trunk. Same config as the other 8 blades. Checked blade switch and configs are all the same.

0 Kudos