Good Day,
I am new to VMware & have gotten my first sys admin job, I am alone, no other IT people here, so I need help.
I have a Proliant DL580 G5 acting as one of our ESXi hosts.
I see that there are some alerts, I have attached a pic, now my main concern first is the controller...Disk 4 says error, so do I replace disk 4? Or is it the controller that is bad & I only need to replace that? Or do I replace both?
I also have a battery alert but am not sure what to do about that, I also suppose that if I need to replace the battery & controller, then I have to power down the host...
If it is only the hard drive, can I just swap out the bad drive, and put in the new one? Or do I have to break the raid first, my understanding is that I should be able to just swap the drive and it will do the rest since it is RAID 5
Please help
Thank You.
If I interpret the screen shot correctly, you have two issue. A bad battery (if this is the original one for the G5 host, that's likely) and a failed HDD. Assuming you are using the Hot-Swappable 146GB SAS HDD's, you can replace the disk online, i.e.pull out the bad disk, wait for at least a minute or two for the controller to recognize it and then insert the new disk. The rebuild of the disk should start shortly after the new disk is in place. For the bad battery I don't think this can be done with the host powered on, so you better replace it with the host powered off to avoid any risks.
André
Welcome to the Community - Before giving any direction on how to repond we will need more information on the errors being thrown - What version of ESXi are you running?
I am also moving the question to a more approriate forum
Great, Thank You,
we are running ESXi 4.1.0
Are you running a vSphere cluster? If so, put the host in maintenance mode (allow it to migrate the other VM's off,) and run diagnostics from the HP Service Pack for Proliant (SPP.) A G5 likely has a good deal of firmware updates available for it (also available on the SPP,), I'm constantly updating my batch of G7's.
You may want to schedule some downtime to shut down the host/VM's and run diagnostics if you don't have maintenance mode. From the looks of things, you've got a bad disk and the controller is running your RAID in an impaired mode. HP Diagnostics or the boot-time Smart Array Controller config context should give you further info.
No cluster, so I guess I will have to manually migrate the vms to our other server & then work on it, is that possible, this way, there will not be any downtime???
So, if I manually move the VMs from one server to another, can I do it without any downtime?
Actually, I'm not sure now, how do I tell if I have a Vsphere cluster?
You would need to have a vMotion port group properly configured on each host, with the same port group name, in the same subnet. Try right-clicking each VM and selecting "Migrate," then working through the wizard. The wizard will tell you whether other hosts in your inventory are valid migration targets.
If it doesn't work, we can see where you're at vMotion-wise (repeat this for your source and destination hosts:)
1.) Highlight your host
2.) Click the "Configuration" tab
3.) Select "Networking" on the left side
4.) Take a screenshot of the networks you see there
5.) Post the screenshot to this thread
Click Home -> Hosts and Clusters -> (expand everything in the left-hand side) -> Post a screenshot
If I interpret the screen shot correctly, you have two issue. A bad battery (if this is the original one for the G5 host, that's likely) and a failed HDD. Assuming you are using the Hot-Swappable 146GB SAS HDD's, you can replace the disk online, i.e.pull out the bad disk, wait for at least a minute or two for the controller to recognize it and then insert the new disk. The rebuild of the disk should start shortly after the new disk is in place. For the bad battery I don't think this can be done with the host powered on, so you better replace it with the host powered off to avoid any risks.
André
Here is the networking screenshot
What about the controller? Or is that just because the disk is bad?
I think this is what you meant:
"Montreal" is your "Datacenter." A Datacenter is an inventory object that can contain VM's, hosts, and clusters.
In this situation, you do not have a cluster defined, and your hosts are standalone. I'm taking a wild guess that the management port you've shown in the networking screenshot is flagged to carry vMotion traffic. Did you try to migrate your VM's to the other host? If vMotion succeeds, that means the validation checks passed and your VM's should stay online.
The controller...if you look below it you'll see a warning on the logical volume that is managed by the controller. The guess is that the warning is rolling up to the controller, and the controller reports the deprecated state as an "error."
I guess to see if vmotion works correctly, I can try migrating one of the powered off ones first, and I suppose I do not select to move the datastore???
So I need to replace the battery and the hard drive?
Thank You all so much, great help, really appreciated
Correct, do not move the VM to another datastore, just the host.
As far as your controller/disk/battery goes, I'd personally stick to migrating your VM's (if the system permits that,) and running the HP diagnostics. While I'm confident you need a disk replacement (btw are there any amber/red lights on this server's drives?,) I'm not 100% sold on the "battery replacement" warning. I've seen that over and over across different vendor's disk controllers for years. The HP diagnostics should tell you the real story behind what's going on with your hardware.
Ok, can I run the HP diagnostics while VMware is still running or does this require the host to run it?
So, basically how do I run the diagnostics?
Thank You
In order to run the diagnostics you need to boot to the SPP CD (which you burn from the ISO image I linked you to.) You'll get a menu when its done booting allowing you to access the diagnostics GUI.
A question for you: If you need to replace the drive, are you planning on getting the drive replacement from HP support? If so, I'd suggest calling HP tech support to diagnose and help you get familiar with the Proliant server environment (iLO, SmartArray, SPP/SUM.)
Sounds like a good learning opportunity.
Yes it is a great learning opportunity.
I am seeing if we have a vendor we buy directly from.
I do not see the link...
There is no more warranty so calling HP is no good, and it takes very long for approval for anything here.
Link:
HP's online support portal will still allow you to download the latest firmware and drivers even though you are out-of-warranty.
What about the controller? Or is that just because the disk is bad?
From my experience with HP hardware over the last few years, I'd rule out a defective controller. The Alert on the controller may either be a result of the failed components or buggy HP Offline Utilities. IIRC it was the February version which displayed issues with the controller.
André