VMware Cloud Community
tekhie
Contributor
Contributor

Monitoring Host failures via Virtual Centre

Hi everyone - we had a situation last week where one of our ESX Hosts stopped working correctly. The Service Console was pinging and the VM's were running, but the Host was showing as disconnected in VC and could not be managed in anyway. It took a server reboot to get things back to normal. To rectify on VMWares recommendation, we have upped the SC memory to 800MB and are going to upgrade the Insight Manager Agents. My question to you is all is what you have decided to monitor in your virtual infrastructures and how you do it.

I have seen in VC that that you can configure an alarm to notify when Host state is either 'Disconnected' or 'Not responding'. What is the difference between the 2 and under which circumstances for each would an alert be sent ?

Apart from this, any suggestions what else i could monitor around the HA/DRS/Service console functionality in order to prevent an issue like i had last week ? Our VM's are all monitored via a 3rd party Agent so that is not an issue.

As always i look forward to your responses

Chris

Tags (3)
Reply
0 Kudos
3 Replies
conyards
Expert
Expert

Chris it sounds very much to me like the VPX agent crashed on the host, most likley you could have got this back and mageable by Virtual Centre by running the command 'service mgmt-vmware restart'. Judging by the support response you've recieved, it is the memory resources required by the insight agents that caused the failure. The same problem can be seen with IBM director and the Java processes that spawns, on a host configured and busy with say 256MB Ram allocated to it.

Sorry if this is a little off topic.

Simon

https://virtual-simon.co.uk/
jccoca
Hot Shot
Hot Shot

Hi,

The alert will be sent when you want, if you want an alert for Host disconnection you can configure it and the same for not reponding.

The diference is that the 'not reponding' triggers when vcenter lost connection with the host and when is disconnected it is not necessary that the comunication have been lost.

We monitor from vcenter the 'not responding' state and with IBM RSA adapters the hardware status

admin
Immortal
Immortal

Hi

I got the same problem with IBM director and VMware Technical Support recommended me to change the console memory at 512 Mb to allocate enough memory for agent in the service console

Jeff

Reply
0 Kudos