Operational Readiness Assessment

Version 1

    Introduction

     

    In an enterprise where frameworks like ITIL v2/v3 or standards like ISO20000 are used, how does VMware Infrastructure fit into these frameworks?

     

    We know that virtualization poses some challenges to these established frameworks when one gets into the details (does vMotin require an RFC?), and these frameworks are comprehensive, so where does VI3 fit in and what changes occur?

     

    This assessment provides a simple checklist format to test for "readiness" where readiness is a specific maturity level of service operations and management, using the ISO20000 format rather than ITIL v2/v3, because ISO20000 is more explicit in its requirements, whereas ITIL is a bit too high level to check for "alignment".

     

    Please use the attached xls spreadsheet which has a simple scoring mechanism in it (see bottom of doc)

     

    Intended Audience

     

    VMware and Service Management professionals during the development and operation of VMware Infrastructure in enterprises that use ITIL/ISO frameworks for service management.

     

    Outline

     

    1. Operational Readiness defined

    2. How to use the assessment

    3. Assessment questions

     

    Operational Readiness defined

     

    Operational Readiness is a predetermined level of capability for service management, in this case where services run on VMware Infrastructure.

     

    Capability is measured by looking at the things you do and how mature (repeatable, predictable, stable) you do them.

     

    One of the challenges for enterprise VMware Infrastructure is bringing together the VMware experts and the Service Management experts.  Often the VMware experts are unfamiliar with ITIL, and the Service Management experts are unfamiliar with VMware Infrastructure.

     

    To facilitate the discussion between these two groups and integrate VI3 into the enterprise framework, I developed an assessment checklist that the combined group can used to drive discussion and form a model of their current and desired capability.

     

    The objective is to understand the present mode of operation (PMO) as it stands today, and work on the definition of the future mode of operation (FM) which represents their desired level of operational readiness.

     

    The checklist represents a combination of ISO20000/ITIL and VMware expertise.

     

    How to use this assessment

     

    The attached spreadsheet has instructions, but quite simply you mark your organization objectively according to the guidelines, moving through one sheet at a time, until you are left with the Summary that shows the results.

     

    You are encouraged to develop this spreadsheet and add your own questions to more accurately represent your organization.

     

    The checklist uses similar groupings to the ISO20000 structure, minus the Relationships section - but that could be added to cover things like VMware Support and ISV Support and Partner relationships.

     

    1.  VMware recommends that you work through this tool with your virtualization team rather than on your own, and in a meeting room with the tool displayed on a projector unit.

    An excellent way to do this is to have your team complete individual spreadsheets prior to the meeting, then amalgamate their individual answers into one sheet.

     

    2.  Check out the Example tab first, it will help you understand the simple approach and scoring technique. 

    Remember this isn't a test, it's just a simple tool to help you.

     

    3.  Step through each tab, answering each question as candidly as possible

    Remember, these questions are to help you assess your capability so you can plan effectively.

     

    4.  The exercise contains 136 questions should not take you more than a morning/afternoon. At the end of the exercise, save this first version for historical purposes

    It is likely you will want to revise this document over time - keep different versions each time.

     

    5.   Please fill in the feedback tab and send this anonymous document to schambers@vmware.com so VMware can update their generic database. 

     

    Assessment Questions

     

    The questions in the attached xls spreadsheet are included below, but use the spreadsheet to score yourself.

     

    The spreasheet has a simple maturity scoring mechanism

     

     

    VMware Infrastructure Operations

    REL.1

    Do you have VMware Infrastructure deployed today?

    REL.2

    Has your VMware Infrastructure design been well documented and accepted / signed off as fit for purpose?

    REL.3

    Do you have detailed test plans to cover system test, integration and
    operations acceptance?

    REL.4

    Do your ESX Servers have a consistent, automated build and configuration?

    REL.5

    Do you have a repeatable build process that is executed by normal
    server build staff, and not the “VMware experts”?

    REL.6

    Is your new ESX Server only deployed as part of a change ticket and automatically entered into the CMDB?

    REL.7

    When an ESX Server is provisioned, enhanced (e.g. more memory) or decommissioned - is your capacity database automatically updated?

    REL.8

    Do you have OLA / UC agreements/contracts, such as time-to-deliver a VirtualCenter server?

    REL.9

    Do you track metrics for releases of VMware Infrastructure?

    REL.10

    Do you analyse and improve VMware Infrastructure deployments based on your metrics?

     

     

     

    Virtual Appliance / Virtual Machine Operations

    REL.11

    Do you have Virtual Appliances / Machines deployed today?

    REL.12

    Does your company you have a “VMware First” policy, where workloads
    are virtualized unless there is a compelling reason not to?

    REL.13

    Do you store and deploy virtual appliances using the OVF standard?

    REL.14

    Do you have a proven method, and well resourced team, to complete
    physical server to virtual server conversions in a timely manner?

    REL.15

    From request to delivery, does it take less than five business days to deliver a new virtual machine to the requestor?

    REL.16

    Do your virtual machines have a consistent, automated build and configuration?

    REL.17

    Are you managing the entire virtual machine lifecycle, including variable decommissioning (e.g. a VM that lives for 3 months or 3 years).

    REL.18

    Do you have a cost model for virtual machines, even if you are not doing chargeback?

    REL.19

    Do you require an approved change ticket to deploy a virtual machine, and is the new virtual machine automatically entered into the CMDB?

    REL.20

    When an virtual machine is provisioned, enhanced (e.g. more memory) or decommissioned - is your capacity database automatically updated?

    REL.21

    Do you have OLA / UC agreements/contracts, such as the time-to-deliver a new virtual machine?

    REL.22

    Do you track metrics for releases of Virtual Appliances / Virtual Machines?

    REL.23

    Do you analyse and improve VA / VM deployments based on your metrics?

     

     

     

    IT Ops

    CON.1

    Do you have an IT Operations team?

    CON.2

    Does your IT Ops / Operations Center manage some aspects of your virtual infrastructure?

    CON.3

    Have your IT Ops staff had VMware Infrastructure training?

    CON.4

    Has anyone from IT Ops been seconded to the VMware team?

    CON.5

    Do IT Ops use documented Standard Operating Procedures for VMware Infrastructure?

    CON.6

    Can all IT Ops staff consistently operate VMware Infrastructure?

    CON.7

    Are IT Ops the only ones with root / administration access to VMware Infrastructure?

    CON.8

    Has VMware Infrastructure been Operationally Accepted without Operational Exceptions?

    CON.9

    Could you raise an SLA breach due to an operational issue?

    CON.10

    Can VMware Infrastructure and Virtual Appliances/Machines be deployed by IT Ops without the intervention of the VMware team?

    CON.11

    Do you have OLA/UC agreements, such as 24x7 support for VMware Infrastructure?

    CON.12

    Do you track metrics for VMware Infrastructure Operations?

    CON.13

    Do you analyse and improve VMware Infrastructure Operations based on your metrics?

     

     

     

    Configuration Ops

    CON.14

    Do you do some form of configuration management in your organization?

    CON.15

    Have you mapped your virtual CIs?

    CON.16

    In a VMware Cluster, do you map a virtual machine CI to a Cluster CI?

    CON.17

    Do your VMware CIs have a consistent structure?

    CON.18

    Can you produce a report for all of your VMware CIs?

    CON.19

    Do you have a gatekeeper person to update Cis, or do you use automatic update methods for Cis and the CMDB?

    CON.20

    When an virtual machine is provisioned, enhanced (e.g. more memory) or decommissioned - is your capacity database automatically updated?

    CON.21

    Is your new ESX Server only deployed as part of a change ticket and automatically entered into the CMDB?

    CON.22

    Do you have OLA/UC agreements, such as the % of accurate CIs?

    CON.23

    Do you track metrics for configuration management?

    CON.24

    Do you analyse and improve configuration management based on your metrics?

     

     

     

     

    Change Ops

    CON.25

    Do you do some form of change management in your organization?

    CON.26

    Does your VMware Team know what changes to VMware Infrastructure require change tickets, and whether changes require CAB approval?

    CON.27

    Does your VMware Team know the risks associated with different types of VMware Infrastructure changes?

    CON.28

    Are your change tickets consistent?

    CON.29

    Are your changes consistently successful?

    CON.30

    Do you use your normal change management system for all VMware changes?

    CON.31

    Does your Change Manager understand virtualization?

    CON.32

    Can you make planned, zero-downtime maintenance to ESX Servers inside normal business hours?

    CON.33

    Could you raise an SLA breach due to a change?

    CON.34

    Do you require an approved change ticket to deploy a virtual machine, and is the new virtual machine automatically entered into the CMDB?

    CON.35

    Do you have OLA/UC agreements, such as the time to approve a change?

    CON.36

    Do you track metrics for releases of Virtual Appliances / Virtual Machines?

    CON.37

    Do you analyse and improve VA / VM deployments based on your metrics?

     

     

     

    Incident Ops

    RES.1

    Do you do some form of incident management in your organization?

    RES.2

    Are all changes to VMware Infrastructure CIs tracked in change tickets?

    RES.3

    Are you able to correlate incidents to changes?

    RES.4

    Are you able to correlate incidents across different CIs?

    RES.5

    Is initial incident investigation handled by a team other than the VMware team?

    RES.6

    Do you have a diagram that shows who is responsible for which part of the infrastructure?

    RES.7

    Are incidents tracked in your ticketing system, and by the service desk?

    RES.8

    Does your service desk consistently pass incidents to the correct teams?

    RES.9

    Could you raise an SLA breach because of an incident?

    RES.10

    Do you have OLA/UC agreements, such as a MTTR?

    RES.11

    Do you track metrics for incident management?

    RES.12

    Do you analyse and improve incident management based on your metrics?

     

     

     

    Problem Ops

    RES.13

    Do you do some form of problem management in your organization?

    RES.14

    Have you documented the best sources of information for VMware technologies, and made this available internally to all?

    RES.15

    Do you have an established 3rd line support team?

    RES.16

    Are the 3rd line support team all VMware Certified Professionals?

    RES.17

    Does your VMware team track all known errors, fixes and workarounds in an accessible (to all) knowledge base?

    RES.18

    Do you involve technical and support experts in problem resolution?

    RES.19

    Could you raise an SLA breach due to a Vmware Infrastructure problem?

    RES.20

    Do you spend time identifying consistent causes of problems and finding permanent fixes for them?

    RES.21

    Do you have OLA/UC agreements, such as a support agreement from a vendor?

    RES.22

    Do you track metrics for problem management?

    RES.23

    Do you analyse and improve problem management based on your metrics?

     

     

     

    Capacity Ops

    DEL.1

    Do you do a form of capacity management in your organization today?

    DEL.2

    Do you track and trend the utilization levels of VMware Infrastructure?

    DEL.3

    Do you forecase the capacity requirements for VMware Infrastructure?

    DEL.4

    Can you model the impact of adding capacity or using more capacity?

    DEL.5

    Can you provide capacity reports on a regular basis?

    DEL.6

    Do you have a well qualified and experienced person performing your capacity management for VMware Infrastructure?

    DEL.7

    Does your Capacity Manager understand %READY?

    DEL.8

    Does your capacity data drive your purchasing or demand decisions?

    DEL.9

    Is your capacity management updated automatically by changes to VMware Infrastructure?

    DEL.10

    Will an incident ticket be raised if a capacity alert is triggered?

    DEL.11

    Would you flag an SLA breach because of capacity issues?

    DEL.12

    Do you have SLAs for capacity?

    DEL.13

    Do you track metrics for capacity management?

    DEL.14

    Do you analyse and improve capacity management based on your metrics?

     

     

     

    Availability Ops

    DEL.15

    Do you do a form of availability management in your organization today?

    DEL.16

    Do you monitor Vmware Infrastructure components for availability?

    DEL.17

    Is your VMware Infrastructure designed for appropriate availability?

    DEL.18

    Do you know the cost to the business of unavailable VMware Infrastructure?

    DEL.19

    Do you have an availability plan for Vmware Infrastructure?

    DEL.20

    Do you regularly measure and report on availability?

    DEL.21

    Are planned changes checked against the Vmware Infrastructure availability plan?

    DEL.22

    Does the output from Vmware Infrastructure availability pass into Service Level Management?

    DEL.23

    Can you raise an SLA breach due to a Vmware Infrastructure availability problem?

    DEL.24

    Are your availabilty plans agreed with the business and enshrined in SLA, OLA and Underpinning Contracts?

    DEL.25

    Do you track metrics for availability management?

    DEL.26

    Do you analyse and improve availability management based on your metrics?

     

     

     

    Security Ops

    DEL.27

    Do you have security functions in your organization today?

    DEL.28

    Is Vmware Infrastructure covered by an approved security policy without exceptions?

    DEL.29

    Does your IT Security team understand virtualization?

    DEL.30

    Do you have an internal collection of the security best practices from VMware?

    DEL.31

    Do you receive VMware Security Notices to a group email address?

    DEL.32

    Have you performed a Security Risk Assessment for your VMware Infrastructure?

    DEL.33

    Do you have security procedures documented?

    DEL.34

    Do all of your VMware team follow the same security procedures?

    DEL.35

    Have IT Security approved the VMware Infrastructure design?

    DEL.36

    Have IT Security approved the VMware Infrastructure implementation?

    DEL.37

    Is VMware Infrastructure monitored by the IT Security tools?

    DEL.38

    Can you raise an SLA breach due to a VMware Infrastructure security incident?

    DEL.39

    Do you have SLA, OLA and Ucs that cover VMware Infrastructure Security?

    DEL.40

    Do you track metrics for security management?

    DEL.41

    Do you analyse and improve security management based on your metrics?

     

     

     

    Service Level Management

    DEL.42

    Do you use SLM to align IT to the business in your organization?

    DEL.43

    Do you have a published service catalogue containing services built on VMware infrastructure?

    DEL.44

    Do you have SLAs for services built on VMware Infrastructure?

    DEL.45

    Do you have a non-technical staff member responsible for SLM (e.g. contracts / purchasing staff).

    DEL.46

    Are your SLA, OLA and Ucs available on an internal knowledgebase?

    DEL.47

    Are changes (e.g. improved availability with HA) driven by the SLM process?

    DEL.48

    Are new service on VMware Infrastructure incorporated into the service catalogue?

    DEL.49

    Is SLM notified by your other VMware Infrastructure processes in case of any SLA breaches?

    DEL.50

    Is VMware Infrastructure monitored by the IT Security tools?

    DEL.51

    Are your SLA, OLA and Ucs actively used and maintained?

    DEL.52

    Do you track metrics for service level management?

    DEL.53

    Do you analyse and improve service level management based on your metrics?

     

     

     

    Resources

     

     

    Author

     

    schambers