We have a cluster with 16 HP Proliant 460.
4 of the esx-servers suddenly has HA-error, and it dosn't help to "reconfigure for HA"
If I look under configuration/networking It's empty.
I can't move the VM's to another ESX because VC say I dont have a management network.
It's a produktion cluster so a can't boot the esx.
If I look at esx console the netconfig is ok.
Restarted all serverice but nothing helps.
Can't add network till VC. Message say's already exist...??
Anyone seen this issue before.
Have a case to but still no luck from VMware
//Lasse
Have you tried removing the ESX host form the VC inventory and readding it to the inventory - this should not impact the running vms but will allow VC to reregister the configuraiton of the ESX host - I have not seen this but that would be the thing I would try -
Have you verified the configs using esxcfg-vswitch -l or esxcfg-nics -l on the problematic hosts?
First thing I would try is to try to put the host into maintenance mode, that should allow you to migrate the guests onto other hosts. Then you can troubleshot the host with these problems.
Try to connect directly to the ESX server and not through VC. Do you see the network configuration this way? If you do then remove the host from the VC and add it again. If you can't connect remotly to the ESX server then you need to check the configuration as someone already suggested: esxcfg-vswif -l, etc.
It's not possible to remove esx from the cluster can't take the in maintenace mode. Message: has no manament network. Timeout
Hi,
Same result If we connent directly to the host. No netconfig...
As I wrote before, all info in console with esxcfg-vswitch - esxcfg-vswif - esxcfg-nics -l are ok
And there is not possible to move out from the cluster. VC don't think the ESX has any network
From esxcfg-vswif can you add a second Service Console?
I had something similar, I restarted the host management service.
service mgmt-vmware restart if I remember rightly
Try 'service network restart' this will restart the networking services, then I would restart the VC sevices with 'service mgmt-vmware restart'. As a last option I would do 'reboot' since you are having these issues.
None of the service restart helps I've tried it before
Reboot is not an option. All VM's is produktion servers and are 24/7 uptime
Try this to create a new service console:
esxcfg-vswitch -a vSwitch2 - This creates a new vswitch (vSwitchx cannot already exist)
esxcfg-vswitch -A 'Backup Service Console' -This sets the port group name
esxcfg-vswif -a vswif1 -i 192.168.0.100 -n 255.255.255.0 -b 192.168.0.255 -p 'Backup Service Console' -This creates the new Service console (need unused IP)
esxcfg-vswitch -L vmnic2 vSwitch2 -This links an unused nic to the 'Backup Service Console'
I know this option to create a backup switch
But I don't have any free nics
Just do it without any nics assigned or add the new Service Console port group to the production vswitch. It won't use it until you change the default gateway device, so there's nothing to worry about. You have to see if anything you set on the networking configuration has any effect.
lol
added a new "backup console" to produltion vSwitch0, restarted all service's but noting shows up
amazing
Can you attach the output of esxcfg-vswif -l and esxcfg-nics -l here?
Here is my config:
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console 10.2.6.27 255.255.0.0 10.2.255.255 true false
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch0 64 6 64 1500 vmnic1,vmnic0
PortGroup Name VLAN ID Used Ports Uplinks
Service Console 2 1 vmnic0,vmnic1
VMOTION 200 1 vmnic0,vmnic1
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch1 64 8 64 1500 vmnic3,vmnic2
PortGroup Name VLAN ID Used Ports Uplinks
VLAN129 129 0 vmnic2,vmnic3
VLAN152 152 0 vmnic2,vmnic3
VLAN151 151 0 vmnic2,vmnic3
VLAN149 149 0 vmnic2,vmnic3
VLAN148 148 0 vmnic2,vmnic3
VLAN147 147 0 vmnic2,vmnic3
VLAN135 135 0 vmnic2,vmnic3
VLAN132 132 0 vmnic2,vmnic3
VLAN10 10 1 vmnic2,vmnic3
VLAN2 2 3 vmnic2,vmnic3
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console 10.2.6.27 255.255.0.0 10.2.255.255
Name PCI Driver Link Speed Duplex MTU Description
vmnic0 02:03.00 bnx2 Up 1000Mbps Full 1500 Broadcom Corporation Broadcom NetXtreme II BCM5706 1000Base-SX
vmnic1 02:04.00 bnx2 Up 1000Mbps Full 1500 Broadcom Corporation Broadcom NetXtreme II BCM5706 1000Base-SX
vmnic2 08:00.00 bnx2 Up 1000Mbps Full 1500 Broadcom Corporation Broadcom NetXtreme II BCM5708 1000Base-SX
vmnic3 0a:00.00 bnx2 Up 1000Mbps Full 1500 Broadcom Corporation Broadcom NetXtreme II BCM5708 1000Base-SX
A couple of questions for you... 1) Is this host part of a cluster? 2) Are other hosts having same issue? 3) Are the guests running properly? 4) Is there an unused guest on another host you can test a migration to the bad host and off the bad host? 5) Do you have ANY sheduled maintenance window? I ask these questions in preparation to just power off the bad host, I understand you are 24/7 but if there are other hosts and you can migrate on and off the bad one powering off the bad host will cause guests to be powered on, on a new host. And of course you want to do this IF you have some kind of maintenance window. I wish you the best of luck, I will continue to watch this thread!
Hi
As you see at the top we have 16 ESX HP 460 C-class in the same cluster
4 esx has the same problem but they are NOT in same encloser or NOT inte same computerhall
It's not possible to migrate to or from this hosts
We have now maintenace windows in the near future
I'm out of ideas for now. Keep us posted if anything changes.