VMware Cloud Community
gravesg
Enthusiast
Enthusiast

vcops vs esxtop

Hi guys.

Im currently looking at vcops 5 advanced edition and doing an evaluation vs vkernel.  I'm finding that vkernel seems to be better at helping resolve the up to  the "5 minute" issues in the environment whereas vcops seems much better at trending and capcity analysis. Could it be the product is still designed in such a way that ESXTOP still would be the go to from within the hour analysis? Basically if an app owner taps you on your shoulder with "my app is slow", what would you open first, second, then third? vcenter, esxtop then vcops?

0 Kudos
4 Replies
kitcolbert
VMware Employee
VMware Employee

Hi,

First I'm curious as to what you found in your evaluation that makes vKernel seem better at helping with "5 min" issues than VC Ops.  Any specifics you can add?

Second, to your question, I like to think about it like this: VC Ops is a monitoring, troubleshooting, and remediation tool.  It monitors your entire environment, trending behavior of individual objects as well as looking for connections within an object.  It collects data as a five minute resolution from VC and thus does give you information about issues happening "now".  Esxtop is a great troubleshooting tool, but it's not very useful for monitoring or remediation.  Really it's bread and butter is troubleshooting a single ESX host.  If you know you have a problem and If you're sure it's isolated to that one host and if you understand all the metrics displayed there, then esxtop is great.  But that's a lot of if's.

Let's take your example.  A user says they have a problem.  If you open esxtop, what do you do?  Which numbers do you look at and how do you know those numbers are "good" or "bad"?  Where do you start, especially if this "app" is actually a multi-tier application running on many VMs on many different hosts?

With VC Ops, you could model the app as a collection of VMs (if applicable) and quickly know whether the VMs, the host(s) the VMs are running on, and the datastore(s) backing the VMs are behaving normally or not.  You can also quickly tell if any of those is overcommited on resources (cpu, memory, disk i/o, etc).  There are a number of different visualizations to help you look across all the related components in your environment to see where the problems are.  It will also enable you to answer the "victim vs villain" question: is there something in VMs that's taking up resources or is it an infrastructure problem?

These are the first questions you want to ask when a user says "my app is slow".  If after a bit of troubleshooting you see that it's isolated to a single host, then perhaps you could switch to esxtop.  But honestly many of the questions you may have can still be answered by VC Ops.

So, this is a long-winded way of answering your question: the first tool I'd recommend using is *always* VC Ops.  There are so many high level questions that need to be answered before you should do any deep dive troubleshooting and VC Ops is designed to help you answer them as well as give you the next level of depth to boot.  Especially as your environment grows and you're managing 10,000 VMs and 1,000 hosts, tools like esxtop and even VC won't scale.  VC Ops gives you a birds eye view of your environment and allows you to dig down to specific problem areas.

Hope that helps.

Thanks,

Kit

gravesg
Enthusiast
Enthusiast

I admit that I'm still finding my way around the interface of vcops and my navigation speed may not be seasoned yet.

However, I've found that with vkernel an issue happening right now is easily reflected as soon as you look at an object. And correlation to an underlying subsystem or related issues with other VMs part of the distributed application are a few clicks away. With vcops the algorithms make it such that  even if there is an anomaly or high stress, the health of an object can still be 98. Effectively you have to click all the badges to get an idea of what's going on. This is probably where enterprise edition shines sure, but we are a small shop, we are focused on advanced edition only.

Real example, I had a VM with major ready time issues on a heavily oversubscribed host. Vkernel tells me this immediatley, but vcops sorts of dances around the issue and never really gives me the smoking gun.

But you are certainly right, ESXTOP is not meant to scale. It looks like this product might be better suited for huge shops it seems, but as a small shop I wanted a tool to help being proactive as well as reactive.

0 Kudos
kitcolbert
VMware Employee
VMware Employee

I didn't mean to say that vC Ops is best only for large enterprises - I think it's great for small shops as well.  I was just giving an example of how esxtop can be limiting if not used correctly.

In any case, thanks for the additional information about your evaluation.  vKernel does show the basic, raw stats collected from vSphere.  VC Ops, as you know, does it differently.  We compute both derived stats (badges like Workload) as well as run analytics to determine trends (e.g. Anomailes).  In the case of Health, we're really balancing off between Workload and Anomalies - only when both are problematic will you see health degrade.  High ready time will influence Workload, however if the high ready time is normal, Anomalies will not be influenced.  Because of this, Health will stay high, because the behavior is expected.  This is just an example and I'm not sure about the actual data in your case, but the point is that Health takes into account both things like cpu usage and cpu ready as well as how normally those stats are behaving.

In terms of "dancing around the issue", I can understand why you'd think that.  VC Ops has taken a different approach from most other performance management tools in that we don't just show the raw stats collected from vCenter.  We actually do some processing of them to try and make better sense of them.  For instance, you know to look for cpu ready - that's great.  But are you looking at cpu co-stop?  And what about the equivalent of cpu ready for memory?  Do you know what stat that is and where to look for it?  What we've tried to do with VC Ops is to create a common set of terms for doing performance management.  So, for instance, we have the term "contention".  This measures the impact of overcommitment.  CPU contention is computed using CPU ready, CPU co-stop, etc.  We also have a Memory Contention that uses the right stats there.  (And, btw, these Contentions are rolled into Workload.)  The point is that you don't need to know every single little stat exposed by VC and what it means.  Instead you can look at our standardized set of stats.  Thus, yes, it may seem like we're dancing around by not directly showing CPU ready, but the point is that you need to be looking at a lot more than just cpu ready.  Rather than deluge you with a bunch of stats (with new ones to learn each ESX release as more and more stats are added), we created a standard terminology that's applicable across all objects and resource types.

Anyway, the point is that the data is there and is actually very front and center, but it's different from what you've seen before.

Thanks,

Kit

0 Kudos
arosemblat
Contributor
Contributor

gravesg,

Interesting observations in trying to solve issues in your environment. The answer to your question from your initial post above:

Basically if an app owner taps you on your shoulder with "my app is slow", what would you open first, second, then third? vcenter, esxtop then vcops?

would be to first, open VKernel vOPS, and select the Performance vScope details page for that VM to get a plain English description of what is going on and how to fix it (and then when available, click on the button to automatically implement the remediation step).

While the systems mentioned above have been designed to help manage a virtualized environment, there are differences in approach and development philosophy that lead to different results. VKernel’s product DNA is to deliver purpose-built tools that provide straightforward, actionable answers.  These features have been refined organically over the past five years through several releases that incorporate feedback from customers.

Of course, the best way to distinguish which approach will work the best in your environment is to identify the solution areas that are the most important, and then try out both systems head-to-head to evaluate the results. Don’t hesitate to reach out if we can provide any assistance during your trial period – arosemblat@vkernel.com.

Alex Rosemblat

Product Marketing Manager

VKernel

A Quest Company

0 Kudos