G'day,
I did a blog post on operation procedures at but I figured people here would both find it useful but also be able to contribute to the discussion.
Here is the contents to save you jumping. Post follow up comments with any of your own idea or items.
VMware Operations
What are your operational procedures for your VMware environment. I often get asked, "Rod, now that I have my new VMware environment, what do I need to do to run it on an ongoing basis?" To me this comes down to two things.
Monitoring
Your monitoring system provides the following functions for you.
Ensures that you are alerted to any pending problems
Allows you to investigate the current and historical state of your environment to assist in trouble shooting
Provides uptime and usage information for management reporting
Provides capacity management projections
Here is an example list of elements for monitoring.
Free space of Datastores
Free space of Service Consoles.
List of orphaned snapshots
List of long running snapshots
Size of VC database
Monitor CPU READY (ms) or CPU %READY per VM per host
Monitor %CPU BUSY percentages per VM per host
Monitor network and disk I/O usage per VM per Host
Monitor service console memory swap usage
Monitor VM balloon memory and swap usage
Host downtime reporting
Server hardware faults (power supplies, fans, IO cards, disks, CPUs, RAM)
SAN hardware faults (disks and vendor specific)
Your monitoring will certainly consist of VMware vCenter Server and also your hardware monitoring platform. Often these are supplemented by a VMware specific product like Vizioncore vFoglight, Veeam Monitor or Nimsoft.
Management
Your management processes and procedures provide the following functions for you.
A list of maintenance activities to perform on a periodic basis.
A list of operational procedures on how to perform standard maintenance and trouble shooting tasks.
A change management impact matrix to detail the potential impact and risk of a particular type of change.
Here is an example list of operational procedures.
The procedure to create a new virtual machine
The procedure to place a new virtual machine within the virtual infrastructure into a Production state. This may be identical to the physical server commissioning procedure.
The procedure to place an ESX server into and then out of maintenance mode, migrating the guests onto other ESX Server hosts.
The procedure used to contact VMware for support. It should include contact information and specify contact methods as well as means of collecting information.
The procedure to add a LUN to an existing ESX server cluster.
The procedure to patch a template used for creating virtual machines.
The procedure to create a snapshot of a virtual machine.
The procedure to restore the virtual machine state to its previous state at the start of the snapshot.
The procedure for investigating user reported virtual machine performance issues. What to check and how to respond.
The procedure to add a disk to an existing virtual machine.
The procedure to expand the size of an existing disk for a virtual machine.
The procedure to shrink a disk used by a virtual machine.
The procedure to remove a disk from a virtual machine.
The procedure to decommission a virtual machine.
The procedure to migrate (VMotion) a virtual machine between ESX Server hosts in the same ESX cluster.
The procedure to build an ESX server.
The procedure to add an ESX server into an existing ESX cluster.
The procedure to migrate a virtual machine between ESX Server hosts in the different ESX clusters (i.e. between datacenters).
The procedure to confirm that a SAN link is active, to be used after a SAN link has failed and been restored.
The procedure to confirm that a network link is active, to be used after a network link has failed and been restored.
The procedure to enable the network group to troubleshoot user reported network / performance issues.
The procedure for backing up/restoring VMs (VM-level and file-level).
The procedure for backing up/restoring VirtualCenter database.
The procedure for backing up/restoring license server files (or keys).
The procedure for restoring VirtualCenter Server.
The procedure for restoring ESX hosts.
Rodos
Blog: http://rodos.haywood.org/
Rodos,
Very nice. Thanks for sharing.
very nice..., when i have some free time i will see if I can get some content added..
Carl
Hey Rodos, good list... I'm working on the same thing over on VIOPS, I wonder if we can collaborate?
So far we have a couple of guides on how to develop Standard Operating Procedures and I also have a list of changes that require SOPs.
I'm also looking for reviewers, researchers and co-authors to help with VI3 Server Consolidation 60-point Deployment Blueprint
Hope you've been enjoying the heatwave whilst London has been drowned in snow...
Cheers
Steve