VMware Cloud Community
edinburgh1874
Enthusiast
Enthusiast

VDP CPU Usage

Hi All,

I recently deployed a VDP appliance to replace VDR in our 5.1 enviroment, backing up through a FC EMC CX3-40f SATA array.

As soon as I setup the appliance (which was a nightmare due to the DNS error message!), CPU usage shot up to 2-3Ghz and stayed this way throughout the day. Bear in mind this was with no backup job configured, and there was little disk IO during this period.

I logged in to the appliance using SSH, and using top, found that the Java processes are taking up most of the CPU - along with intermittant bursts from "gsan".

I then setup and kicked off a backup of our enviroment after hours, and saw that the CPU usage shot up to 10GHz - I had to stop it to stop it affecting other VMs.

I understand that this product will require heavy resources due to the dedup, but is this really normal behaviour? It's maxing out an HP G7 blade at the moment!

I have tried combinations of 1 - 4 vCPU, upping the memory to 16Gb and updating Tools.

Anyone else seeing similar behaviour, or have any advice?

17 Replies
a_nut_in
Expert
Expert

Hey EB,

I think this is what you are facing.

http://kb.vmware.com/kb/2041068

I think this would explaing the Java/Tomcat high CPU issue.

If this does not help, you might want to involve VMware on this to check if there are any other known or as yet in-the-wild issues you could be running into Smiley Happy

Regards

a

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
edinburgh1874
Enthusiast
Enthusiast

Thanks for your reply. Yes, looks like that's the issue...unfortunately a reboot resolves it for a while but it always comes back.

Decided to look at an alternative product as VDP is not ready for production use.

The main problem is the resource requirements, which are apparently normal for this software.

I tested it with one backup job and it was using 6Ghz.

There were other things as well, such as poor logging, slow boot time (1 hour for us), password complexity requirements and the DNS error bug during install that made me decide it's ready for use yet.

Reply
0 Kudos
escapem2
Enthusiast
Enthusiast

my issue is VDP causes high CPU usage in my vcenter appliance...I don't know why I open a case with vmware we reconfigured as they said worked OK for a month and again same issue high CPU usage in vcenter caused by this VDP software

Reply
0 Kudos
edinburgh1874
Enthusiast
Enthusiast

VDP is ideally meant to be run on it's own hardware, I had it running on an old HP G1 blade because of it's resource requirements...it's based on EMC Avamar which I believe was shipped with it's own hardware appliance.

I received a call from VMWare stating that VDP will take up as much resources as it can possibily take, if you are running it in a DRS cluster you should put a cap on CPU/memory resources. VMWare don't do this out of the box. It won't affect operation of the appliance, but backup jobs will take more time to complete.

See my post here

http://communities.vmware.com/message/2228577#2228577

I ended up looking into alternatives, as VDP was causing too many issues in my enviroment.

Reply
0 Kudos
MartinSvec
Enthusiast
Enthusiast

I can confirm that VDP appliance has enormous CPU usage even when it's supposed to be idle. We're testing VDP 5.1.10 in our lab now and there's no improvement regarding CPU 😞 Our testing 2TB VDP backups two Windows 2008 R2 VMs and one Windows 2012 VM. All three VMs ale clean installations with no activity. VDP reports 16GiB deduped size and 100GiB non-deduped size. I would expect such basic testing setup will cause close-to-zero CPU usage almost all the time. However, VDP performance charts look as follows:

vdp-cpu-usage.png

vdp-read-rate.png

That is, VDP uses about 1.8GHz CPU on average, with 3.7GHz peak every 5 minutes :-(( There's also disk read peak every 5 minutes.

The 5-minute peaks in CPU and Read Rate charts are caused by gsan+java processes. All other CPU peaks are caused by java processes, with a periodic contribution from avmaint processes. Below are the corresponding "top" captures:

vdp-gsan+java-peak.pngvdp-java-peak.pngvdp-java+avmaint-peak.png

All the charts and captures were taken in the middle of Maintenance Window. I don't think it's a KB2041068 issue because VDP restart does not help.

Question: is this a common expected behavior of VDP? If so, what the hell the appliance does every 5 minutes and why the management java/tomcat processes are so CPU hungry? Is there any way to reduce CPU usage when no backup/maintenance tasks are running? (I know It's possible to set CPU/IOPS limits of VDP VM but it results in longer backup times.) There are numerous efforts to reduce servers' power consumption by keeping CPUs in deep idle states as long as possible -- perhaps VMware/Avamar missed these trends?? I estimate that the constant energy needs + higher HW requirements + consequent cooling increase the daily operating costs of VDP by about 25-50% compared to VDR :-(( And it's a big question if the promised higher dedupe ratios can compensate these operating costs.

Martin

Reply
0 Kudos
PhSLU
Enthusiast
Enthusiast

I have the exact same problem.

if VDP is started, my VCSA cpu usage goes through the roof (about 10x more than when VDP is not running). (and when I mean running, I do not say that it is backing, just the appliance is started).

I have also open a call at VMWARE but no luck for now (not even temporary)

What did the propose for your case ?

Reply
0 Kudos
vmatt89
Contributor
Contributor

PhSLU,

Are you running the latest version of VDP 5.5? Just checking to see if this problem still persists.

Thank you

Reply
0 Kudos
PhSLU
Enthusiast
Enthusiast

Yes we are running VDP 5.5

And yes, the problem persist, we have a open call at VMWare ... whom themselves open a call with EMC for avamar ... under investigation Smiley Sad

Reply
0 Kudos
matteowiz
Contributor
Contributor

Hello,

I've read that VDP performs a storage performance check every 5 minute.

However no one mentioned if there is a workaround to disable it.

Regards,

--

matteo

Reply
0 Kudos
PhSLU
Enthusiast
Enthusiast

Do you know how I can disable this check ...

I still have no reply from VMWare.

Reply
0 Kudos
syn4ck
Contributor
Contributor

Read performance check can be disabled by setting perfinterval from default 300 to 0 :

avmaint config --ava perfinterval=0

but it's unsupported and VDP may stop working and it won't solve your high cpu usage on VCSA.

VCSA cpu load come from high login/logout events. 

You can check how many login/logout event you have in your vpx_event table on vcsa :

first, from a shell on vcsa, connect to postgresql server : /opt/vmware/vpostgres/9.0/bin/psql -U vc VCDB

next, count all events in vpx_event table : select count(*) from vpx_event;

then count login/logout event : select count(*) from vpx_event where event_type in ('vim.event.UserLogoutSessionEvent','vim.event.UserLoginSessionEvent');

to logout from postgresql, simply type \q

don't try to delete directly on this table !

A simple workaround to reduce cpu usage on VCSA is to lower event retention policy to 1 day. this is supported.

A more complex one is to use a patched cleanup_events_task_proc to delete user login and user logout events each day. Of course, this is unsupported by VMWare

matteowiz
Contributor
Contributor

Hi thank your for your response. I was unaware of the perfinterval option. I'm going to start testing with an higher value there, such as 3000/6000. is it supported? can you be more specific here?

I don't have vcsa high cpu usage problem, only very high io from vdp appliances every 5 minute during all day.

Regards,

--

matteo

Reply
0 Kudos
PhSLU
Enthusiast
Enthusiast

Thank you very much for these info.

I was on call with VMWare and EMC (avamar) today ... they didn't know as much !!!

I am very reluctant to lower the retention policy for the event of the VCSA as I sometimes have to search back in the history to understand what happen to the vCenter.

But maybe I'm mistaking because I do not seem to see any event related in the GUI of the VCenter.

in my case :

/opt/vmware/vpostgres/9.0/bin/psql -U vc VCDB

psql.bin (9.0.13)

Type "help" for help.

VCDB=> select count(*) from vpx_event;

count

--------

988582

(1 row)

VCDB=> select count(*) from vpx_event where event_type in ('vim.event.UserLogoutSessionEvent','vim.event.UserLoginSessionEvent');

count

--------

856803

(1 row)

VCDB=> \q


Reply
0 Kudos
PhSLU
Enthusiast
Enthusiast

@syn4ck Could you tell me "how to" if I want to run the "patched_cleanup_events_PostgreSQL.sql" just once (and then check the count again as i did) Also, if I decide to use your patch, how should I schedule it ? Thank you in advance.

Reply
0 Kudos
syn4ck
Contributor
Contributor

You can see login and logout in realtime from vsphere client by selecting your vcenter server :

vdp_events.png

To replace cleanup_events_tasks_proc procedure, copy patched_cleanup_events_tasks_proc.sql file on vcsa /root/, then from postgresql, load it by typing \g /root/patched_cleanup_events_PostgreSQL.sql

Then all you hav to do is to wait few hours (up to 24).

The original procedure can be restored with \g /usr/lib/vmware-vpx/sql/cleanup_events_PostgreSQL.sql

Reply
0 Kudos
syn4ck
Contributor
Contributor

@matteowiz

I found this option in avamar 6.1 technical addendum.
You should ask VMWare or Avamar/EMC support before changing configuration.

Reply
0 Kudos