I'm an advocate of rebuilding ESX servers rather than spending hours of troubleshooing time trying to find a needle in a hay stack. Depending on the response time you're getting from VMware, HP, or wherever you're getting your VMware support from, rebuilding starts to look attractive rather than spending hours on the phone or days of email log shipping back and forth with the support technicians. Of course, rebuilding would only be a logical decision if there were no underlying hardware issues that were the root cause of the host failure - an ESX host rebuild won't resolve hardware problems.
I have heard from many people who agree, supporting their opinion with claims that "I can rebuild an ESX server in X number of minutes". I've performed many deployments and I had a pretty good idea of what my deployment times per box were but I've never timed myself to put my own money where my mouth is. I decided to run through a build with a stopwatch. Following are my results.
Hardware:
HP Proliant DL580G2
4x P4 XEON 2.5GHz processors (512KB L2, 1MB L3 cache)
12GB RAM
Embedded Smart Array 5i disk controller (Read/Write cache present and enabled)
2x72GB Ultra320 10kRPM drives in a hardware RAID1 mirror
4x Broadcom Gb NICs
Software:
ESX 3.5.0 build 64607 (new and improved 2/20/08 edition)
The stopwatch begins running when the server begins booting the ESX 3.5.0 CD
0m: Boot from CD, select keyboard, mouse, manual disk partitioning, time zone selection, etc.
5m: File copy stage begins.
14m: Above file copy stage completed. Reboot.
17m: Server is booted up. Set date/time. Enable root SSH login. Manually create VMKernel vswitch using VIC. Run 166 lines of post install configuration scripts including installation of Proliant Essentials 7.9.1 agents.
27m: Scripted configuration stage above complete. Reboot
31m: Server is booted up. Remaining configuration done with VIC: Add host to Datacenter. Double check proper licensing. Configure bandwidth throttling on 2 portgroups. Increase virtual switch port count on VM switch. Configure swISCSI adapter and connect to ISCSI target. Add host to Vizioncore vRangerPRO. Move host from Datacenter into Cluster.
34m: VIC configuration stage above complete. Reboot.
37m: Server is booted up and ready minus patching.
At the 37 minute mark, the server is up and ready for action, minus patching. There are currently 9 patches for ESX 3.5.0 which will take an additional amount of time to apply using VMware Update Manager.
37m: VMware Update Manager remediation applied. Host enters maintenance mode and VMotions off one running VM
38m: Patching begins. 9 patches to install.
44m: 9 patches installed. Reboot.
47m: Server is booted up and thinks about the idle life for 8 minutes before exiting maintenance mode.
55m: Exit maintenance mode. Server is patched and ready for action.
Conclusion: On slower hardware with some scripted automation, ESX server can be completely rebuilt in under an hour. Efficiencies to decrease the total build time would be newer servers with faster bus speeds and processors, 15kRPM disk spindles, faster VirtualCenter server hardware, advancing the post install scripted configuration, selectively installing only need patches rather than all patches, using a kickstart.cfg script and/or automated deployment tool such as Altiris/HP RDP to complete the initial installation stage, capturing a host installation with patches and Proliant Essentials agents already installed, etc.
I'd be interested in hearing about your build times and automation methods.
[i]Jason Boche[/i]
[VMware Communities User Moderator|http://communities.vmware.com/docs/DOC-2444][/i]
I have found in the past, although not personally responsible for this, with the use of some form of deployment server over PXE boot.. can cut these deployment times mentioned in half. 20 mins to install and then run said scripts.
Can save a lot of time this, and also means that with the right configuration you don't have to actually go and stick a CD in the physical machine!
I do also agree with most issues that arise, with the nature of ESX server and the fact you can just take a server offline without any hassle that it is just easier to rebuild as opposed to screaming at a dead host.
P.
Jason -
Nice article about the install times. I have seen similar times with installs myself, although on different hardware minus the post install scripts. Since I am not a scripting guru I have always had to do it manualy. I would be interested in what installs you are automating with the scripts?
I will conclude that the theory of rebuild vs. troubleshoot extends to many areas in I.T., for example workstations, especially when you have a standard image you can deploy. I have had tech's spend 4-5 hours on a problem when a rebuild / restore data would have taken at the most 2 hours... Now, some will say that you are never finding the root cause of the problem and that is true, but in todays world, some problems are single instance and probably user created. For that type, I will take the lesser time.
Regards...
Hey Jas,
I am getting about the same from the time I kick off the altiris job to when the script ends, which is right before joining Virtual Center. I usually measure this to a lunch break so I do not have the exact time but we are in the same ballpark.
Scripts Rule!!!!
Steve Beaver
VMware Communities User Moderator
*Virtualization is a journey, not a project.*
jamieorth, here's an example of the automated configuration scripts I run. A lot of the lines are comment or redundant, but they are there for my sanity.
script has been attached as attachment in one of my later posts below.
Jason Boche
VMware Communities User Moderator
I have found in the past, although not personally responsible for this, with the use of some form of deployment server over PXE boot.. can cut these deployment times mentioned in half. 20 mins to install and then run said scripts.
Can save a lot of time this, and also means that with the right configuration you don't have to actually go and stick a CD in the physical machine!
I do also agree with most issues that arise, with the nature of ESX server and the fact you can just take a server offline without any hassle that it is just easier to rebuild as opposed to screaming at a dead host.
P.
Jason,
Send me a copy I like to collect these!!!
Steve Beaver
VMware Communities User Moderator
*Virtualization is a journey, not a project.*
Jason,
Can you send me the script? Thanks.
Cheers,
Jason,
Can you send me your script too? I would like to test them out how fast is it comparing PXE Kickstart & UDA deployment. Nice to see workable automated install without physically placing the CD media.
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Regards,
Stefan Nguyen
nevermind I downloaded the text file, thanks for posting that!
Once again, thanks for sharing your work Jason! I can't award points here???
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Regards,
Stefan Nguyen
Thanks Jaoson.
Hey guys , can I get some of your help. Under "Installation and Configuration" Subject, ESX server booting up, This guy needs help.