VMware Cloud Community
krisdeluxe
Contributor
Contributor
Jump to solution

VMware Alerts, need help

Good Day,

I am new to VMware & have gotten my first sys admin job, I am alone, no other IT people here, so I need help.

I have a Proliant DL580 G5 acting as one of our ESXi hosts.

I see that there are some alerts, I have attached a pic, now my main concern first is the controller...Disk 4 says error, so do I replace disk 4? Or is it the controller that is bad & I only need to replace that? Or do I replace both?

I also have a battery alert but am not sure what to do about that, I also suppose that if I need to replace the battery & controller, then I have to power down the host...

If it is only the hard drive, can I just swap out the bad drive, and put in the new one? Or do I have to break the raid first, my understanding is that I should be able to just swap the drive and it will do the rest since it is RAID 5

Please help

Thank You.vmwarealerts.jpg

0 Kudos
1 Solution

Accepted Solutions
a_p_
Leadership
Leadership
Jump to solution

If I interpret the screen shot correctly, you have two issue. A bad battery (if this is the original one for the G5 host, that's likely) and a failed HDD. Assuming you are using the Hot-Swappable 146GB SAS HDD's, you can replace the disk online, i.e.pull out the bad disk, wait for at least a minute or two for the controller to recognize it and then insert the new disk. The rebuild of the disk should start shortly after the new disk is in place. For the bad battery I don't think this can be done with the host powered on, so you better replace it with the host powered off to avoid any risks.

André

View solution in original post

0 Kudos
27 Replies
weinstein5
Immortal
Immortal
Jump to solution

Welcome to the Community - Before giving any direction on how to repond we will need more information on the errors being thrown - What version of ESXi are you running?

I am  also moving the question to a more approriate forum

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

Great, Thank You,

we are running ESXi 4.1.0

0 Kudos
proden20
Hot Shot
Hot Shot
Jump to solution

Are you running a vSphere cluster?  If so, put the host in maintenance mode (allow it to migrate the other VM's off,) and run diagnostics from the HP Service Pack for Proliant (SPP.) A G5 likely has a good deal of firmware updates available for it (also available on the SPP,), I'm constantly updating my batch of G7's.

You may want to schedule some downtime to shut down the host/VM's and run diagnostics if you don't have maintenance mode.  From the looks of things, you've got a bad disk and the controller is running your RAID in an impaired mode.  HP Diagnostics or the boot-time Smart Array Controller config context should give you further info.

HP Service Pack for ProLiant

0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

No cluster, so I guess I will have to manually migrate the vms to our other server & then work on it, is that possible, this way, there will not be any downtime???

So, if I manually move the VMs from one server to another, can I do it without any downtime?

0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

Actually, I'm not sure now, how do I tell if I have a Vsphere cluster?

0 Kudos
proden20
Hot Shot
Hot Shot
Jump to solution

You would need to have a vMotion port group properly configured on each host, with the same port group name, in the same subnet.  Try right-clicking each VM and selecting "Migrate," then working through the wizard.  The wizard will tell you whether other hosts in your inventory are valid migration targets.

If it doesn't work, we can see where you're at vMotion-wise (repeat this for your source and destination hosts:)

1.) Highlight your host

2.) Click the "Configuration" tab

3.) Select "Networking" on the left side

4.) Take a screenshot of the networks you see there

5.) Post the screenshot to this thread

0 Kudos
proden20
Hot Shot
Hot Shot
Jump to solution

Click Home -> Hosts and Clusters -> (expand everything in the left-hand side) -> Post a screenshot

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

If I interpret the screen shot correctly, you have two issue. A bad battery (if this is the original one for the G5 host, that's likely) and a failed HDD. Assuming you are using the Hot-Swappable 146GB SAS HDD's, you can replace the disk online, i.e.pull out the bad disk, wait for at least a minute or two for the controller to recognize it and then insert the new disk. The rebuild of the disk should start shortly after the new disk is in place. For the bad battery I don't think this can be done with the host powered on, so you better replace it with the host powered off to avoid any risks.

André

0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

Here is the networking screenshotnetwork.jpg

0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

What about the controller? Or is that just because the disk is bad?

0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

I think this is what you meant:

cluster.jpg

0 Kudos
proden20
Hot Shot
Hot Shot
Jump to solution

"Montreal" is your "Datacenter."  A Datacenter is an inventory object that can contain VM's, hosts, and clusters.

In this situation, you do not have a cluster defined, and your hosts are standalone.  I'm taking a wild guess that the management port you've shown in the networking screenshot is flagged to carry vMotion traffic.  Did you try to migrate your VM's to the other host?  If vMotion succeeds, that means the validation checks passed and your VM's should stay online.

The controller...if you look below it you'll see a warning on the logical volume that is managed by the controller.  The guess is that the warning is rolling up to the controller, and the controller reports the deprecated state as an "error."

0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

I guess to see if vmotion works correctly, I can try migrating one of the powered off ones first, and I suppose I do not select to move the datastore???

So I need to replace the battery and the hard drive?

Thank You all so much, great help, really appreciated Smiley Happy

0 Kudos
proden20
Hot Shot
Hot Shot
Jump to solution

Correct, do not move the VM to another datastore, just the host.

As far as your controller/disk/battery goes, I'd personally stick to migrating your VM's (if the system permits that,) and running the HP diagnostics.  While I'm confident you need a disk replacement (btw are there any amber/red lights on this server's drives?,) I'm not 100% sold on the "battery replacement" warning.  I've seen that over and over across different vendor's disk controllers for years.  The HP diagnostics should tell you the real story behind what's going on with your hardware.

0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

Ok, can I run the HP diagnostics while VMware is still running or does this require the host to run it?

So, basically how do I run the diagnostics?

Thank You

0 Kudos
proden20
Hot Shot
Hot Shot
Jump to solution

In order to run the diagnostics you need to boot to the SPP CD (which you burn from the ISO image I linked you to.)  You'll get a menu when its done booting allowing you to access the diagnostics GUI.

A question for you:  If you need to replace the drive, are you planning on getting the drive replacement from HP support?  If so, I'd suggest calling HP tech support to diagnose and help you get familiar with the Proliant server environment (iLO, SmartArray, SPP/SUM.)

Sounds like a good learning opportunity.

0 Kudos
krisdeluxe
Contributor
Contributor
Jump to solution

Yes it is a great learning opportunity.

I am seeing if we have a vendor we buy directly from.

I do not see the link...

There is no more warranty so calling HP is no good, and it takes very long for approval for anything here.

0 Kudos
proden20
Hot Shot
Hot Shot
Jump to solution

Link:

HP Service Pack for ProLiant

HP's online support portal will still allow you to download the latest firmware and drivers even though you are out-of-warranty.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

What about the controller? Or is that just because the disk is bad?

From my experience with HP hardware over the last few years, I'd rule out a defective controller. The Alert on the controller may either be a result of the failed components or buggy HP Offline Utilities. IIRC it was the February version which displayed issues with the controller.


André

0 Kudos