VMware Cloud Community
ARCsupport
Contributor
Contributor
Jump to solution

Server Monitoring Tools - Advice, Direction, help please

Hello there,

First off let me say ESXI is amazing and flat out works awesome! I'm a nub when it comes to vmware and I just finished upgrading/migrating our entire company from a SBS2000 shop to Server 2008 standard VM's hosted in ESXI. Let me tell you, setting up the ESXI hosts was the easiest part of that whole ordeal lol. I'm looking for a tool that can monitor ESXI hosts and send alerts via email/sms and or potentially shut down VM's. For example: Say the AC unit dies on an extremely hot day on the weekend and our machines are overheating. Is there a tool that will alert via smtp/sms and initiate a shut down command to our VM's and then the host PMs? I'm finding it hard to grasp how this is possible because I don't see how software could run in a VM and be able to "access" the actual ESXI host hardware conditions.. At a minimum I'm looking for a tool that can send alerts via email/sms if there is a problem so I'd be able to address the issue before any damage occured.

We currently have two ESXI 4.0 hosts on Dell PowerEdge T610 Servers. I used the ESXI version specifically for Dell servers (if that helps).

Three virtual machines, two Server 2008 Standard and one Server 2008 R2. If you know of any tools that would help me achieve what I've mentioned above I would greatly appreciate it. I did a fair amount of searching on the web but did not find a solution to fit all our needs yet. It can be open source or payware.

Thanks for your time,

Dustin

1 Solution

Accepted Solutions
DSTAVERT
Immortal
Immortal
Jump to solution

I use a a temperature monitor device http://www.temperaturealert.com/ It would require some scripting but J1mbo's could be combined to create a proper shutdown.

There are probably hundreds of things to be worried about. Don't get too hung up on a single one.

-- David -- VMware Communities Moderator

View solution in original post

Reply
0 Kudos
14 Replies
J1mbo
Virtuoso
Virtuoso
Jump to solution

This should help: http://blog.peacon.co.uk/hardware-health-alerting-with-esxi/ Thermal shutdown is entirely possible but you need to consider the rate of rise, the best way is to test this by turning off the AC and timing how long the room takes to reach a critical point (perhaps 40*C) during a maintenance window.

Since you have the Dell version installed you can probably also manage them the Dell OpenManage Server Assistant?

http://blog.peacon.co.uk

Please award points to any useful answer.

LucasAlbers
Expert
Expert
Jump to solution

damn cool.

We have a mix of free and paid versions and this sure would be helpful.

Reply
0 Kudos
ARCsupport
Contributor
Contributor
Jump to solution

Hey, thanks for the link. I couldn't get the perl script from the Peacon blog to work, kept getting compile errors, but I was able to use the original script that William Lam wrote. It's pretty limited/simple but is better than nothing.

I'm looking into the UPS shutdown info but it doesn't look like there is any easy way to use it for ESXi 4.

Reply
0 Kudos
ProPenguin
Hot Shot
Hot Shot
Jump to solution

As far as ESXi logging and setting up alerts for certain issues. I am using syslog to push event logs to a server with Splunk installed. You should be able to setup alerts with Splunk based on the event. Also it makes for a great and easy way to search through the logs.

Hope this helps.

Reply
0 Kudos
golddiggie
Champion
Champion
Jump to solution

What it sounds like you're looking to do is just one of the things you can do with a vCenter Server... Since you have two hosts, you could go with either the vCenter Server Foundation license, or the vCenter Server for Essentials license, depending on which license set you already purchased. If you already have vCenter Server, then you can spec out the Alarms to send you email's when an even occurs. You can also modify the Alarm settings, or even create new alarms that you want to have logged, or send you an email.

A lot of what you'll have for options depends on which licenses you purchased for the ESXi servers...

For the thermal type events, a lot depends on how the host reports the temperatures to vCenter... But, if you're having concerns about the AC unit failing during hot days, you have bigger issues you need to resolve.

For power management functions, within the vCenter/vSphere configuration, you would need to have the license that includes DPM, but even then it's not going to power down hosts due to thermal events, just when the demand is low enough for everything to run on the other host. To do what I think you're asking for, you'll need to configure a monitoring system (or get thermal monitoring hardware, such as those offered by IT Watch Dogs). Depending on what you're using for UPS devices, you might be able to create scripts that will monitor the thermal device, and then promt the UPS to start powering off the hosts. You will need the correct API for the UPS in place so that you can have the host vMotion critical VM's from one host to another, to keep company critical servers running...

Personally, I'd rather not automate such things. I'd rather get an email alert that the temps are above acceptible parameters, and then remote in and start the migration of VM's to one host and power off the other. Or power off both hosts, and other hardware too...

A solution that will probably help you sleep better at night would be to get a better AC unit for the server room. Or a portable unit to augment the existing unit until you can get that resolved/replaced...

VMware VCP4

Consider awarding points for "helpful" and/or "correct" answers.

J1mbo
Virtuoso
Virtuoso
Jump to solution

Re performing a host shutdown, esxi-control.pl has this functionality, or you could enable SSH and use plink.

Please post any info on the compile errors you found in esx-health.pl here or PM me or post on[peacon blog comments|http://blog.peacon.co.uk/hardware-health-alerting-with-esxi/#comments]:)

Cheers

http://blog.peacon.co.uk

Please award points to any useful answer.

Reply
0 Kudos
ARCsupport
Contributor
Contributor
Jump to solution

Thanks golddiggie. I've been looking into the vCenter Essentials license but we don't need 90% of the features you get which makes it hard to justify the cost. Even though it's really not much money, I don't get to make the final decision on that. We haven't purchased any licenses for ESXi thus far, just running the free stuff.

I'm not so much having concerns about the AC unit failing, it's brand new and actually lives in the basement under the floor of the server cabinet. I'd just like to have a procedure in place in case it ever does fail, it is a machine with moving parts, so it definitely can fail.

I checked out IT Watch Dogs and they have some nice stuff, thanks for that link! I'm sure we could use some of their equipment in the future.

I've been playing with VEEAM's free monitoring tool and it works pretty well so far. It has the ability to run scripts automatically when and alarm is tripped if needed. The problem with veeam monitor is the hardware monitor is very broad, I'd like to set an alarm based strictly off the case temp in our esxi hosts. I'm going to give the full version a try for 60 days and see if it adds more specific definititions for alarms/notifications. We certainly could remote in and start shutting down machines bases off a notification alarm. But what happens if my boss or myself are not able to get to a computer with internet access in time? It wouldn't take long for the servers to heat up if the AC unit were to fail.

Reply
0 Kudos
ARCsupport
Contributor
Contributor
Jump to solution

Hey J1mbo,

I've got esxi-health working great now, nice job on that! The problem I had earlier was operator error on my part, sorry about that.

I think I've got an idea of how I’m going to accomplish an automated & graceful shut down using esxi-control.pl. The one part I'm still trying to figure out is how to initiate esxi-control.pl based off a specified thermal range/condition. I was thinking of using Veeam's monitor tool to run it if a hardware alarm enters the "alert" state. But unless I can initiate the alarm based solely off the temp sensor (which you can't) veeam isn't going to work for us.

Like I said I'm a nub to esxi and a total nub to dealing with perl, but I'm trying to learn! The time and effort you're putting in to try and help me is much appreciated!

Thanks.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal
Jump to solution

I use a a temperature monitor device http://www.temperaturealert.com/ It would require some scripting but J1mbo's could be combined to create a proper shutdown.

There are probably hundreds of things to be worried about. Don't get too hung up on a single one.

-- David -- VMware Communities Moderator
Reply
0 Kudos
simonlam
Contributor
Contributor
Jump to solution

Check the eG VM Monitor - http://www.eginnovations.com/web/vmware.htm

Regards,

Simon

Reply
0 Kudos
VeraxSystems
Contributor
Contributor
Jump to solution

Give a try to VMware plug-in for Verax NMS (http://www.veraxsystems.com/en/products/nms). Notifcations can be send out using e-mails or SMS and applying processing rules you can perform actions triggered by alarms.

Reply
0 Kudos
ARCsupport
Contributor
Contributor
Jump to solution

So after a long time of leaving this issue on the back burner, I was finally able to spend some time and come up with an acceptable solution.

I used a combination of the esxi-control.pl script, coupled with a powershell script I made, along with a third party vendor named Temperature Alert's product.  We used their $129 USB version:  http://www.temperaturealert.com/Wireless-Temperature-Store/Temperature-Alert-USB-Sensor.aspx

Here's a quick run down of our environment and how this works.

In our server cabinet we have four machines, currently two ESXi 4 hosts using Dell poweredge T610 hardware,  one older dell server to run our accounting system and finally our voicemail server:

-Host #1: has one server 2008 vm which is our PDC/DNS/DHCP/File/Print server.

-Host #2: has two vm's. One server 2008 that's our SQL 2008 DB server and one server 2008 r2 that's our exchange 2010 server.

Our fear was that one day the small A/C unit, which is mounted to the ceiling in the basement below the server cabinet might fail and damage hardware/cause a failue while the business(s) are closed.

I first line of defence was notification.  The temp@lert usb product satisfied this issue with the ability to send out an email when the temperature exceeds a difined level.  In this case, if the temp was getting out of control, my boss or I could log in remotely and shut down the machines.  But, what happens if neither one of us are available to log in remotely?  This motivated us to find a way to automatically, gracefully shutdown both the vm's and the hosts based off the cabinet temperature.

The temperature alert device has the ability to execute a powershell script each time it reads the temperature; I initially wanted the temp alert software to execute a batch file upon reaching a certain temperature, that would call the perl script, thus gracefully shutting down our esxi hosts and vm's.  Nope, not happening.  The temp alert software is limited only to PowerShell scripts and runs it everytime it samples the temperature, in our case, every 120 seconds.  I contacted tech support for temp alert and I was provided a PS script example that would record the temp data to a XML file which could later be parsed as needed.  After tweaking the PS script I received from them, I then went on the write a new powershell script to parse the data in the XML file.  The PS script is run as a scheduled task on our accounting server and is capable of the following:  Sends out a warning email when the temp reaches defined level, shuts down the ESXi hosts and shuts down the accounting server. 

I used an old workstation with and an evalution copy of ESXi 4 to test with and last weekend we tested it on our production servers.  We shut off the A/C and basically sat there and watched the temp rise.  It worked perfectly!  It only took about 50 minutes for the cabinet to exceed 95*F.  Warning email went out, email with shutdown immenent message went out and all machines shutdown cleanly.  I also added our Verizon mobile phone numbers @vtext.com to a new exchange distribution group so we receive sms messages with the temperature/shutdown information; this is in case we are somewhere outside of data coverage.

Basically the order of operation is this:

Temp Alert software reads the temp >> writes/overwrites said xml file with temperature information every 120 seconds

>> Scheduled task runs PS script every 4 minutes on accounting server which is where the temp alert usb device is installed.

PowerShell script parses data from XML, if the values in the PS script are satisfied, a warning email, shutdown immenent email/host shutdown will occur.

To shutdown the hosts, the PS script executes batch files, which call the esxi-control.pl script and provide the credentials neccessary to perform the shutdown of the host.  And since the vm's have the VMware Tools installed, the host is able to cleanly shutdown the guest machines.  Not too bad for only having to spend $129 on a simple usb temperature device.  

Reply
0 Kudos
ARCsupport
Contributor
Contributor
Jump to solution

DSTAVERT wrote:

I use a a temperature monitor device http://www.temperaturealert.com/ It would require some scripting but J1mbo's could be combined to create a proper shutdown.

There are probably hundreds of things to be worried about. Don't get too hung up on a single one.

This is exactly what we did and it works great.

Reply
0 Kudos
levina
Contributor
Contributor
Jump to solution

Check the VMware Monitor from MindArray IPM (http://www.mindarraysystems.com/vmware-performance-monitoring-tools.php). This allows setting threshold on temperatures, fan and other Hardware sensors. And also does performance monitoring out of the box.