VDP: Out of Memory and configuration not saved

LatinSuD · ‎05-10-2013

Yesterday I manually renamed a few virtual machines (which contained special characters) and reconfigured backup jobs with the new names.

During night no backup happened at all.

This morning I rebooted the appliance and verified that changes were not saved, ie. the virtual machines I added were not in the backup jobs anymore.

I reconfigured backup jobs again and launched a simple test job (which only contains one VM). It DID NOT WORK. Task stuck about 30% and it did not even reach to perform VM snapshot.

After checking logs I found out of memory (OOM) condition, which sound funny because I already upgraded to 6GB of RAM.

- VDP has:

2TB disk

6GB RAM

4GB swap

- Datacenter has:

~200 VM in total

~100 VM covered by backup jobs

10 backup jobs

This is the beginning of the OOM logs in /var/log/messages:

May 10 11:45:55 vdp -- MARK --

May 10 11:49:18 vdp kernel: [ 1899.735068] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0

May 10 11:49:18 vdp kernel: [ 1899.735077] java cpuset=/ mems_allowed=0

May 10 11:49:18 vdp kernel: [ 1899.735081] Pid: 11286, comm: java Tainted: G X 2.6.32.49-0.3.1.3755.0.PTF-default #1

May 10 11:49:18 vdp kernel: [ 1899.735083] Call Trace:

May 10 11:49:18 vdp kernel: [ 1899.735149] [<ffffffff810061dc>] dump_trace+0x6c/0x2d0

May 10 11:49:18 vdp kernel: [ 1899.735167] [<ffffffff8139b076>] dump_stack+0x69/0x73

May 10 11:49:18 vdp kernel: [ 1899.735189] [<ffffffff810b8e3c>] oom_kill_process+0xcc/0x2f0

May 10 11:49:18 vdp kernel: [ 1899.735204] [<ffffffff810b94c0>] __out_of_memory+0x50/0xa0

May 10 11:49:18 vdp kernel: [ 1899.735208] [<ffffffff810b96a8>] out_of_memory+0x198/0x210

May 10 11:49:18 vdp kernel: [ 1899.735213] [<ffffffff810bcc66>] __alloc_pages_slowpath+0x4b6/0x5f0

May 10 11:49:18 vdp kernel: [ 1899.735220] [<ffffffff810bceda>] __alloc_pages_nodemask+0x13a/0x140

May 10 11:49:18 vdp kernel: [ 1899.735227] [<ffffffff810c039e>] __do_page_cache_readahead+0xce/0x220

May 10 11:49:19 vdp kernel: [ 1899.735235] [<ffffffff810c050c>] ra_submit+0x1c/0x30

May 10 11:49:19 vdp kernel: [ 1899.735239] [<ffffffff810b70f3>] filemap_fault+0x3c3/0x3d0

May 10 11:49:19 vdp kernel: [ 1899.735244] [<ffffffff810cfd77>] __do_fault+0x57/0x520

May 10 11:49:19 vdp kernel: [ 1899.735252] [<ffffffff810d46f9>] handle_mm_fault+0x199/0x430

May 10 11:49:19 vdp kernel: [ 1899.735260] [<ffffffff813a07df>] do_page_fault+0x1bf/0x3e0

May 10 11:49:19 vdp kernel: [ 1899.735268] [<ffffffff8139e0ff>] page_fault+0x1f/0x30

May 10 11:49:19 vdp kernel: [ 1899.736442] DWARF2 unwinder stuck at page_fault+0x1f/0x30

May 10 11:49:19 vdp kernel: [ 1899.736444]

LatinSuD · ‎05-10-2013

I upgraded to 8GB of RAM.

This process is eating 6GB or RAM alone:

/usr/local/avamarclient/bin/avvcbimage

Note that for this test i'm only executing 1 Job regarding 1 single VM.

Is this huge memory greed normal?

LatinSuD · ‎05-15-2013

No way, the process was eating all memory available to backup a powered off VM which only has 100MB of disk.

I was also getting these warnings in mcserver.log:

WARNING: com.avamar.mc.sdk.McsFaultMsgException: E22234: Group does not exist.

WARNING: com.avamar.mc.sdk.McsFaultMsgException: E22289: Retention policy does not exist.

Finally, it looks like it fixed itself. I only edited a few jobs and it worked again (not sure if related).

Later I also removed all backup jobs, upgraded to 6.1.81, and created jobs again.

Still getting those warnings though.

All

VDP: Out of Memory and configuration not saved