VMware Cloud Community
breakaway9000
Enthusiast
Enthusiast
Jump to solution

ESXi + IBM x3500 7977: RAID Monitoring

I'm using a IBM x3500 system with 7977 for ESXi. It has a IBM ServeRAID 8k controller in it, with 8 x 300GB 10,000RPM SAS drives.

Currently, the server runs Windows Server 2003. It has IBM's java based ServeRAID management tool that notifies me when a disk is faulty, when a raid controller battery is starting to drop out, etc. It has a feature where I can right click on the faulty drive on screen, and then click on "Identify Drive" and it flashes a light on the chosen drive so I can see which physical drive it is.

If I switch the server to ESXi, will these features still be available? They're pretty critical. If yes I guess I'll need some sort of driver/application combo from IBM, the driver I install on the server and the management application on a networked PC and connect to the ESXi server. I'm reasonably well versed working in the shell in linux, so I'm not afraid to get my hands dirty and recompiling some drivers etc if need be.

Is my understanding of this correct?

If yes to all the above, which one of these drivers do I need?

Thanks in advance - Any Insight Appreciated

Reply
0 Kudos
35 Replies
snowdog_2112
Enthusiast
Enthusiast
Jump to solution

Looks like we're in the same boat - floating, but half filled with water.

Veeam monitor is super easy to install and connect to the host.  There is a good deal of customization possible - it looks like it uses the same alarm definitions as the host.

What is odd is that I got Veeam emails every 10 minutes since last night saying "device health alarm" for the degraded array (I plugged the disk back in but it did not auto-rebuild since it had data on it).  So Veeam is seeing the alarm at the host level, but vCenter does not.

I have all IBM hardware with the IBM-specific build of esxi 4.1.  The hardware health status updates as expected when I pull a power cord or a disk.  However, not even the default alarm definitions trigger the alarm at the vCenter level.  It's not that it doesn't send an email or SNMP trap - it's flat not triggered.  (Other alarms are working and sending emails - like putting a host in Maintenance mode, so I know *some* alarms work).

So, the only way I know a disk is dead is by looking at the Health Status - but you have to connect your viClient directly to the host, if you connect the client to vCenter, there is no Health Status item on the Configuration tab!

I had run vihostupdate with 2 different update-bundles containing drivers for the LSI controller.  Neither would cause the host to be discovered in the LSI MegaRAID Storage Manager installed on a VM or remote box.

The MegaCLI copied to the \bin folder on the host for the LSI controllers *does* work as expected, and I am able to perform all controller functions (array status, remove a disk, replace a disk, initiate a rebuild) without rebooting the host.

This means that I can recover from a disk failure, but I still can't get anything (aside from Veeam Monitor) to tell me there *is* a failure.  And Veeam only seems to tell me there is a problem, not specifically what has failed.  At this point, however, I'll accept that over flying completely blind.

Let me know if you find resolution - I'll do the same.  It's nice to know I'm not the only one with a bruised forehead from banging it against the wall!

Reply
0 Kudos
breakaway9000
Enthusiast
Enthusiast
Jump to solution

I believe one more thing you could try is obtaining a IBM RSA-II card for your server. The built-in BMC (baseboard management controller) IPMI is far too lacking - it can't function by itself (i.e. it needs software like IBM Director to read sensor status, and even then it doesn't read all the sensors).

So theoretically, you could get a RSA-II card, Then, set up e-mail notifications for when there is an error condition. The beauty of this system is that it is 100% independent of the OS that's running on the server, so it is guaranteed to work in unison with any OS you put on it in the future.


I've read the manual for IBMs' RSA-II card, it can read the system sensors (memory, PSU, hard disk drives, etc) and then e-mail a contact (i.e. you) via a SMTP server if needed.

I've been looking for a RSA-II card for my x3500 7977 (http://www-947.ibm.com/support/entry/portal/docdisplay?brand=5000008&lndocid=MIGR-64240), but very few seem to be available second hand etc, I will call IBM later to find out how much it will be.

At the moment this seems to be the cheapest and most hassle-free way to set up the kind of monitoring we want.

And yes, it's nice to see I'm not the only one having this problem. What I can't wrap my head around is how everyone else is doing this? Perhaps they all have 'supported' hardware. Looks like everyone else can afford to drop tens of thousands of dollars on 'supported' servers and SANs :smileysilly:

Reply
0 Kudos
SeanFromIT
Contributor
Contributor
Jump to solution

I agree, your best bet is setting up RSA/IMM + Director. This is done by running another network cable to the management port and configuring its IP in the system BIOS. Then in IBM Director you point at this management IP, NOT the ESXi IP. You can also visit the management IP in your browser and see some health info that way, without Director. But Director is needed for e-mail alerts as far as I know.

Reply
0 Kudos
breakaway9000
Enthusiast
Enthusiast
Jump to solution

http://www.ibm.com/support/docview.wss?uid=psg1MIGR-57091

^ Manual for RSA II. It clearly states that you can enter a SMTP server in there for failure notifications. So you don't need IBM director.


The 'newer IBM servers' (i.e. System x3500 etc) come with a BMC IPMI integrated (a separate ethernet port on the back of the server, IP controlled through BIOS), but it is severly crippled - you can't acess it by simply entering it's IP address in the browser, you need to install IBM director. The BMC IPMI can't read all sensors like disk drive status.

Reply
0 Kudos
snowdog_2112
Enthusiast
Enthusiast
Jump to solution

I think I've had a bit of a breakthrough.

The built-in IMM is capable of providing enough of an alert.  I configured the Network Protocols to send an email, and when I pulled a disk, I got an email in less than 10 seconds.  I actually received SIX emails from the IMM for the event.  Wow!

So, between the IMM, an external SMTP/SNMP service (I don't want to use services hosted on the box in case VMware itself crashes and the hosted VM's are down at the same time), vCenter on a hosted VM, and the MegaCLI on the ESXi host, I can monitor, get alerts, replace, and rebuild a failed RAID array without rebooting the physical host.

Let me know if you need any information from my testing.  I'd be happy to share my hard-won knowledge!!!

It was a bit of a trial-error process to figure out the MegaCLI bit to replace a disk and get the array to rebuild.  The auto-rebuild did not fire for me, even after pulling the disk, putting it in another physical host and writing 0's to the disk (using the Preboot CLI with the PDClear command).  I don't know if there was something related to the serial number or other identifier on the physical disk because I put the same disk back into the array that I pulled - something I'd not do in the event of a real failure.

Thank you all for keeping this issue going!

Reply
0 Kudos
DSTAVERT
Immortal
Immortal
Jump to solution

You could consider creating a Document from this post and adding some additional details.

-- David -- VMware Communities Moderator
Reply
0 Kudos
breakaway9000
Enthusiast
Enthusiast
Jump to solution

snowdog_2112 wrote:

I think I've had a bit of a breakthrough.

The built-in IMM is capable of providing enough of an alert.  I configured the Network Protocols to send an email, and when I pulled a disk, I got an email in less than 10 seconds.  I actually received SIX emails from the IMM for the event.  Wow!

The IMM meaning what exactly? For this to work, do the drives need to be 'visible' to ESXi or is it independent of the OS (such as the RSA-II IPMI)?

Reply
0 Kudos
snowdog_2112
Enthusiast
Enthusiast
Jump to solution

IMM is the new name for the RSA/RSA-II module.  If you're familiar with RSA, the IMM is pretty much the same thing (four our purposes here).

I'm attaching some screenshots of my IMM and the event when I pulled disk #3.

On the newer IBM server hardware it's integrated on the board.  There's a Management Port (RJ45) on the box, and the configuration for IP address is in the hardware BIOS.  For example, the boxes I get (x3500's and x3650's), I have 2 Eethernet ports and 1 MGMT port.

The IMM is accessed the same as the RSA via a web browser to the IP specified in the BIOS.  You can power on/off the physical box. It also has Console access in the browser (it seems to work best in Firefox) - so you can access the Vmware Support shell if you don't have SSH access to the ESXi console.

It is completely independent of the OS.  In my testing, I pointed the SMTP configuration of the IMM to a Windows VM I have.  It can also be configured to send SNMP alerts.

Reply
0 Kudos
breakaway9000
Enthusiast
Enthusiast
Jump to solution

What is the port that you have to access the IMM on? I.e. what's the URL?

I'm very keen to give this a try right now, if it worked for me also it'd solve all my problems! However I have a x3500.

Reply
0 Kudos
snowdog_2112
Enthusiast
Enthusiast
Jump to solution

In the x3500 M3, the IP for the IMM is set in the POST BIOS (i.e, press F1 for Setup at the POST screen).  I just gave it an IP on my subnet, and then point a browser to that IP.

For example - the one I am using for this testing is sitting on 192.168.1.52.

Point a browser to http://192.168.1.52 and you get the logon screen shown in the PDF I attached.

Configuring the email alerts is under Network Protocols in the menu on the left.  Just set an SMTP server (make sure your SMTP server allows relay from the IP address of the IMM).  Then configure the Alerts to add an alert for your email address and the level of alerting you want.

Again, this is on the newer 3500's with the IMM.  I have a couple of x3500's out in the field with the RSA-II modules, I believe the process is pretty much the same for those - though I can't recall if the settings for the IP address on the RSA is in the F1 - Setup menu at the IBM POST or not...

Reply
0 Kudos
breakaway9000
Enthusiast
Enthusiast
Jump to solution

Yeah that's what I've been trying. But my web browser times out, even though I can ping the IP address of the IMM.


I think that's because the actual RJ-45 port on the back and the RSA-II adapter are separate parts. The RSA-II is a 'daughterboard' configuration sits on the motherboard I believe. Without it's presence, the built in IPMI is too crippled and can't read much info from the system sensors at all.

Without the RSA-II adapter in system, I think you get substantially crippled functionality out of the IPMI.

Reply
0 Kudos
snowdog_2112
Enthusiast
Enthusiast
Jump to solution

That is true.  If you don't have the RSA add-on (it was a $250-300 option on the older 3500's), then you'd have the port, but limited functionality.

Can you telnet to the IP you assigned?  The RSA and IMM both have telnet/ssh interfaces, but I don't know if you can configure alerts via telnet.

In that case - I highly recommend you get yourself an RSA.  It is worth it just to have "console" access over ethernet.  As long as there is A/C power to the box, you can power it on/off and even see the POST and get into the BIOS - all *remotely*!

It's well worth the price of admission.  I would not deploy another server without that capability, especially a VM host.

In fact, since I started using the IBM boxes for my customers, I include the RSA/IMM *and* the 3-year 24x7x4hr onsite support.

Unfortunately, I've had to use that support for dead motherboards on 3 occasions, but fortunately it is available and worth every cent of the price.

Reply
0 Kudos
breakaway9000
Enthusiast
Enthusiast
Jump to solution

Nope telnet / ssh doesn't work, although IBM Director does see it.

Have you ever installed an RSA-II yourself?

Reply
0 Kudos
snowdog_2112
Enthusiast
Enthusiast
Jump to solution

Yes.  I have a couple x3500 M2's in customer locations with the RSA-II module installed.  We installed the module in-house before deploying the servers.

Reply
0 Kudos
breakaway9000
Enthusiast
Enthusiast
Jump to solution

Does it go into a PCI slot or is it a daughterboard sort of thing onto the motherboard?

Also how much do these cost? I've contacted IBM support but they haven't gotten back to me on account of the weekend, and I'm away all of next week.

Reply
0 Kudos
snowdog_2112
Enthusiast
Enthusiast
Jump to solution

As I recall, it is a daughterboard.

At the time, I was paying $250 - 300 for the module when I ordered the servers new.  Looks like they can be found for < $100 these days.

Here's the IBM link for it: http://www-947.ibm.com/support/entry/portal/docdisplay?brand=5000008&lndocid=MIGR-64240

Here's an ebay auction for one: http://cgi.ebay.com/IBM-REMOTE-SUPERVISOR-ADAPTER-RSA-II-SLIMLINE-13N0833-/170580391623?pt=LH_Defaul...

Reply
0 Kudos