vCenter Operations - The Beginnings of VMware's Cloud Monitoring Stack

Are VMware slowly becoming the new Oracle? That was the question being asked after the backlash to their initial vRAM licensing model at the launch of vSphere 5. With VMware’s somewhat quick retraction, it was ensured that any unsavoury Larry Ellison comparisons were quickly put to bed. Despite this it was still a signal of intent and an indication of VMware’s recognition of its ever growing influence and clout. Now as VMware make serious manoeuvers into the PaaS space, a VMware based Cloud monitoring solution is a must. So with this in mind what is to be made of the huge marketing and push for their customers to adopt their VM monitoring tool vCenter Operations 5.0?

Back when VMware was only considered ideal for virtualizing test and development environments, the native alarms and performance graphs of the then termed Virtual Center were more than adequate for general monitoring purposes. As the vSphere revolution began with the average customer having VMs numbering in the hundreds, VMware generously allowed a plethora of third-party performance and capacity tools to plug into vCenter via their SDK. Suddenly every subsequent VMworld trade show would get bigger not just by the number of attendees but by the number of VM monitoring companies and tools such as vKernel vOPS, Veeam Monitor, VMTurbo Operations Manager, Quest vFoglight and Xanagti VI to name just a few. So when in February 2011 VMware eventually did enter the monitoring space with the purchase of Integrien’s Alive and its later relaunch and rebranding as vCenter Operations Manager, there wasn’t anything majorly distinctive between what were already mature and in most cases cheaper solutions. More than a year later, a huge marketing campaign and a revamped version, vCenter Operations 5.0 is slowly gaining traction amongst end users as the VM monitoring tool of choice but how much of this is related to its actual capabilities as opposed to VMware driving an agenda to monopolise a market segment that is clearly profitable?

To answer this, the first thing to do is to assess whether there is a need for such a tool, whether it’s any good and what distinction, if any does it bring from the competition? The truth is that anyone who has had to troubleshoot a VMware environment, or gauge the capacity or performance, regardless of the size of the infrastructure will testify that the default tools are simply not sufficient. Add the factor that more and more business critical applications are now virtualised with virtual environments growing at an immense rate, then an enterprise-grade performance, capacity and monitoring tool is a necessity.

So looking at the vCenter Operations Manager vApp (from now on to be referred as VCOPs) the first thing to note is that it collects data not only from VMware‘s vCenter Server but also from vCenter Configuration Manager as well as third-party data sources such as SNMP. This collected data is then processed by the vCenter Operations Manager Analytics VM which presents the results through the rather colourful looking GUI. Compared to its predecessor the most notable change with the VCOPs 5.0 GUI is its integration with vCenter’s navigation/inventory pane. This small yet effective change makes it look much more like a VMware product as opposed to the bolt-on appearance that both Integrien and previous VCOPs versions possessed.

Using the themes / badges of Health, Risk and Efficiency, the GUI organises the view of an entire infrastructure onto a main dashboard that can be drilled down to root causes and further details. Utilising a green, yellow, red scheme where green means good and red is bad, the badges are a quick indication of areas of concern or that require investigating. By seeing something as red, a couple of clicks and simple drill down will show you the relevant VMs and their affected hosts as well as any shared or affected datastores. Furthermore each badge carries a score where a high number is good for the Health and Efficiency badges but potentially detrimental for the Risk badge as a low risk is optimum for your environment. All of this enables quicker troubleshooting in large VM environments as issues can be quickly pinpointed from a very high level view down to the granular detail in just seconds.

The Health badge identifies current problems in the system and highlights issues that require immediate resolution. Using a heatmap, the end user has a quick health overview of all parent and child objects such as virtual machines and hosts that can also be rewound by up to six hours to track back trends. The Risk badge identifies exactly that and uses data based on infrastructure stress, time and capacity remaining. It also identifies potential issues that could impact the infrastructure’s performance and can also be trended back to seven days worth of data. Finally the Efficiency Badge, which takes advantage of the now integrated CapacityIQ tool, is used for capacity planning where CPU, memory and disk space resource metrics are referred to for identifying overprovisioned, under-utilised or optimally resourced VMs.

As well as the Badges and their drill down details, VCOPs also has several menu tabs such as Operations, Planning, Alerts, Analysis, and Reports. Of most interest in the Operations tab is the Environmental section where a visual representation of objects such as the associated vCenter Server, datacenters, datastores, hosts, and virtual machines are presented alongside their scores and relationship. This is an excellent feature that enables the end user to quickly drill down, identify and investigate more granular objects of concern and their health status. The Planning section also contains a very useful summary section that provides a visual overview in graphs and tables of capacity for any selected object enabling you to easily switch between deployed and remaining capacity. Here VCOPs provides the ability to have extended forecasts of remaining capacity for up to several months, an essential value add especially as environments grow at such a radical pace.

In addition to the capacity planning and forecasting features, it’s also good to see VCOPs incorporate what-if scenarios. Now becoming common amongst several VM monitoring tools, what-if scenarios are a useful addition to any VM environment especially as they allow you to foresee the impact on capacity and workload on your virtual environment prior to making any actual changes.

Finally the area in which VCOPs really stands out from the competition, is its unique and new vCenter Infrastructure Navigator feature. With the understanding that paramount to any business, monitoring solutions that look at the performance of their applications as opposed to just their infrastructure are far more attractive, VMware’s vCenter Infrastructure Navigator has been introduced to automatically discover application services and map their dependencies and relationships. One of the main benefits of having a knowledge of the application and virtual infrastructure’s interdependencies is that it will immediately help reduce MTTR by either eliminating or implicating the infrastructure as a cause of application slowdowns. Furthermore as key applications and their underlying infrastructure are constantly identified and monitored the end user can quickly ensure that the right level of resources are allocated and that priority is given to those VMs that actually need it.

When you put this in the context of disaster recovery and more specifically VMware’s latest version of Site Recovery Manager, end users now have the opportunity to create recovery plans and protection groups that are aligned to the applications that reside on their VCOPs monitored VMs. This is a far cry from the competition whose equivalent Disaster Recovery solutions still don’t allow you to automatically failback or even failover multiple VMs simultaneously. Using VCOPs’ metrics and mapping of application interdependencies with VMs and underlying hosts, the level of sophistication in Disaster Recovery planning is raised significantly in that it’s now related to what matters to the business most, namely the apps.

So while this all sounds great does VCOPs really spell the end of other VM monitoring solutions and a consequent reduction of third party stalls and their scantily clad glamour models at VMworld? Does it really constitute a comprehensive Cloud monitoring solution? At present, probably not. VCOPs is still more expensive than most of its competitors with a per VM pricing model and still has some limitations, most significantly its inability to monitor physical servers in the same way it monitors VMs. It also has to gain a market share by going against already popular and seasoned solutions that have already existent end users and champions. In saying that, this is VCOPs 5 and is merely the beginning.

Looking firstly at the price challenge, VCOPs is software and indeed it would be foolhardy to not expect the pricing model to change and become more attractive to new customers or even bundled in with new hypervisor purchases. When looking at the bigger picture, VMware are clearly focusing further up the stack with a PaaS offering and it would also be short-sighted to think VMware only see VCOPs as a single entity product that just monitors the infrastructure space. If anything it’s an investment to what will be an integral component to a comprehensive Cloud monitoring package that enables successful migrations to VMware’s PaaS offerings. In such a scenario it would be ideal for VMware to have a PaaS offering that was already built on an IaaS monitoring, management and orchestration solution that they themselves have developed. Furthermore should the next version of VCOPs or the package that it comes with include the ability to monitor analytics that incorporate physical blades we could well have an integrated monitoring tool that’s impossible to compete with. Just imagine being able to run a what if scenario on a physical blade prior to virtualizing it onto a VM so that you’re able to size up the resources accurately not based just on current metrics but also analysed and predicted growth?

So taking a step back and looking at the whole VMware portfolio it seems that the heavy investment and marketing of VCOPs is more a ploy of eventually tying in many of their separate solutions as a single comprehensive management, orchestration and Cloud monitoring package that is managed singlehandedly via the vCenter interface. Currently VMware is littered with lots of separate solutions that include vCloud Director and vCenter Chargeback Manager but if they were fully integrated with VCOPs they would make a tasty introduction package to those looking to deploy a Private Cloud. Then there’s VMware’s Hyperic which has the ability to close the aforementioned physical gap as it can monitor the physical environment underlying vSphere hence providing performance management of applications in both physical and virtual environments. Therefore it’s not impossible even today for a Cloud infrastructure’s components to be monitored with a bolted together Hyperic and VCOPs solution, with Hyperic monitoring the applications and the processes running on vCloud Director, which in turn is conveyed to VCOPs which is monitoring the VMs and consequently other components such as vShield Manager. But for VMware to be successful in the PaaS sphere they need to enter and engage with a market segment they’ve had little exposure to i.e. the application owners. Looking at VMware’s vFabric Application Performance Manager (driven by AppInsight) as part of a fully integrated package of VCOPs, vCloud Director, Hyperic etc. could be they key that opens the door to application owners. It would also provide VMware a true Cloud monitoring solution that could provide real-time visibility and control from application to infrastructure via a single management interface.

Ultimately this requires a lot of development work and effort from VMware and will eventually bring them into competition with a new breed of vendors that specialize in Cloud management and monitoring. The point is as good as VCOPs is and as good as the competitors are it's important not to get blinkered and avoid the bigger picture of what VCOPs may eventually become part of. Either way VCOPs’ current competitors need to up their game quick or find alternative features to their solutions – to survive in this game it’s clear, you’re either big or niche.