VMware Cloud Community
bobst_martin
Contributor
Contributor
Jump to solution

ESX host not responding in VIC

Hi every body,

this monring, 1 ESX host is "not responding" in VI Client, but 2 VMs are still running well.

I've tried :

  1. service mgmt-vmware restart

  2. service vmware-vpxa restart

the mgmt-vmware never stopped

the vmware-vpxa stop/start successfully, but it doesn't solved my trouble.

what should I do ? any idea ? thanks for any help.

Reply
0 Kudos
1 Solution

Accepted Solutions
admin
Immortal
Immortal
Jump to solution

Are you swapping really heavily? Is the / or /var partition full? Is hostd or mgmt-vmware spiking?

Run the following commands to check:

free -m

df -h

top

View solution in original post

Reply
0 Kudos
11 Replies
nirubagur
Enthusiast
Enthusiast
Jump to solution

what is the response from ping to the service console. Check the VC logs, may be you can find something there. You may try bouncing the host but thats the last option ...!!

award points of you found this answer helpful
Reply
0 Kudos
bobst_martin
Contributor
Contributor
Jump to solution

thanks to help me.

log are to small are full very fast, the oldest line is yesterday 9pm, and it still say the same thing

every 2 or 3 minutes, VC try to move VMs to an other host, but it fails

"Failed to fail-over VM1 on ESX2 in cluster ...."

"Failed to fail-over VM2 on ESX2 in cluster ...."

"Failed to fail-over VM1 on ESX4 in cluster ...."

"Failed to fail-over VM2 on ESX4 in cluster ...."

"Not enought resources to failover VM1 in cluster ..."

"Not enought resources to failover VM2 in cluster ..."

...and same 6 messages again and again ...

First important thing: cluster have 4 HOSTs ESX1 ESX2 ESX3 ESX4

ESX3 is not responding

ESX1 seems to be fine

VC never try to fail-over VM on ESX1, ... hum, ... strange

Second important thing, on VI Client I select the Cluster and I have a yellow banner with this text:

"Configuration Issues

Insufficient resources to satisfy HA failover level on cluster

Unable to contact Primary HA agent in cluster"

what do you mean by "bouncing" ? shutdown / restart the HOST ?

Reply
0 Kudos
IB_IT
Expert
Expert
Jump to solution

the mgmt-vmware never stopped

If your mgmt-vmware service never stopped/started correctly, you could have bigger issues on hand. I would suggest opening an SR with VMWare to further troubleshoot. You may need to force some processes to shut down by killing the PID's. You could also have a hardware issue.

bobst_martin
Contributor
Contributor
Jump to solution

wouhaou, you scared me .

I'm expecting user to stop working, because 1 VM is running exchange.

and then I would reboot the host.

in case it's hard, the problem will re appear.

any way, I'm able to run with only 3 ESX for a while.

edit:

writing this post, I received green light to power down ESX3.

ESX3 is still powering down, but it release all VMs

now all VM are running on other host.

I'm still expecting that power down finished to power on and trouble shoot ...

edit2:

shudown never succeded, I must power off with power button

power on hang I have some strange caractere on the screen during starting process, like : "]*%~-\^" and so on....

it's a Fuji server, fuji agent doesn't report any trouble from the hard...

I'm surprise, ... hum

I will format and re install ESX3.5

the server is very new, it was a clean install in january, not an upgrade from 3.0

any suggestion before I format ? calling VMWare ? may be to late Smiley Sad

Reply
0 Kudos
bobst_martin
Contributor
Contributor
Jump to solution

what a strange problem.

I've tried to power-off power-on again, without success.

I've been to BIOS, check all parameters, CPU OK, RAM OK, ...

I've set up, in the bios, to make a complete test on all 4 CPU and all 24GB RAM at next startup

after a long boot , VMWare started very well, and it's still working.

nothing have been changed, and it works !

after, I've updated the Fuji Agent, to have a better version, to monitor the hardware.

thanks for your help

Reply
0 Kudos
Dave_Mishchenko
Immortal
Immortal
Jump to solution

Your post has been moved to the ESX Server 3.x Configuration forum

Dave Mishchenko

VMware Communities User Moderator

Reply
0 Kudos
ggrayson
Contributor
Contributor
Jump to solution

I just went through a similar issue -

ended up removing and re-creating 'vswif0' through the iLO using 'esxcfg-vswif'.

bobst_martin
Contributor
Contributor
Jump to solution

I will keep it in mind for next time, it seems to be a great idea.

but I was able to PING vswif0 IP adress , also putty was able to connect.

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

Are you swapping really heavily? Is the / or /var partition full? Is hostd or mgmt-vmware spiking?

Run the following commands to check:

free -m

df -h

top

Reply
0 Kudos
bobst_martin
Contributor
Contributor
Jump to solution

Thanks appk, but the problem in no more present.

I told few line before, that after poweroff / resetbios / poweron, the ESX startet normally.

but I still don't know what's happened

Know, I 'm taking any ideas / command to run some checks next time (wishing, I will never have this problem again)

any way , I will keep your commands too, I haven't tryed them.

what I'm sure, the disk is not full (neither / nor /var, ...)

and killall5 haven't kill all process.

Reply
0 Kudos
bobst_martin
Contributor
Contributor
Jump to solution

the problem never appear again

problem solved by reboot and bios reset

thx to all

Reply
0 Kudos