VMware Cloud Community
ChrisGurley
Enthusiast
Enthusiast

Snapshot creation takes 1min per GB of RAM -- need community testing help

Hey vGeeks,

I have a long-running case with VMware Support (#12148282302) surrounding the behavior we observe when a snapshot creation is initiated in vSphere on a fully patched ESXi 5 host. I'll list our environment specs below, but the gist is that when I create a snapshot of a VM (doesn't matter if busy or totally idle), the VM becomes unresponsive to varying degrees for roughly a minute and then takes approximately one minute per gig of RAM (in the VM) to complete the snap. To accentuate the issue, I created a new VM with nothing running in it (other than W2k8R2) with 16GB of RAM and it take 14-18min to create a snap. This happens in both of our clusters/sites.

Where I need YOUR help:

If you have a bit of time (and I know this is a big ask), either test with an existing VM or with a new one and post back your stats and hardware config when creating a snapshot (with memory capture). Quiescing is irrelevant; this happens when the memory is captured on the snap regardless of anything else.

Our environment:

  • Dell Poweredge R810s (quad Intels, 256GB RAM), R710s (dual Intels, 96GB RAM)
  • Redundant FC connections to SAN through Brocade and Cisco fiber switches
  • HP 3PAR T400 and V400 arrays running latest InForm OS revs and with 100+ FC 15K drives
  • VMs running latest VMware Tools and latest Windows patches on W2k8R2; also happens on Ubuntu w/ high RAM

I appreciate any help/validation. We don't believe this happened on ESXi 4.x, but we're open to any explanation.

Thanks,

Chris

Tags (3)
Reply
0 Kudos
4 Replies
J1mbo
Virtuoso
Virtuoso

I'd trouble shoot this by looking at the disk performance as seen by the hosts and indeed in the guests.  Using IOMeter on an affected VM, what sequential write throughput are you seeing with say 32k IOs and an 8 IO queue depth?  And, at what rate do storage vMotion events run?

Reply
0 Kudos
ChrisGurley
Enthusiast
Enthusiast

Hey J1mbo,

Already did all that (IOmeter, etc) with Support about 3 months ago (we've been working on this since February). It isn't an I/O issue (per them) nor is it affected whether it is on an isolated host or even writing to local storage (though that does seem to be worse, so there could be some correlation).

Support is currently trying to explain the unresponsiveness part of the problem (which lasts for roughly a minute regardless of RAM, and up to the 30ish% mark on creation progress) with the following KB article: http://kb.vmware.com/kb/1013163. I'll grant that some minor unresponsiveness or stunning is relevant to this, but not to the degree we're seeing.

All that said, not to ignore your questions, but we couldn't get much useful info out of IOmeter, because the VM is being repeatedly stunned, so the unresponsive periods are literally moments in which the VM is frozen, so no stats are recorded. What metric (GB/min?) would you like me to present you regarding the storage vMotion rate? IMO, we have really good performance with that.

Thanks,

Chris

Reply
0 Kudos
vonsch
Contributor
Contributor

Seeing the same thing on ESXi 4.1

Machines with 16GB of ram take approx 20 minutes to snapshot when including the memory, regardless of workload.

Confirmed on 3 different VMs.

R910's with iSCSI Equallogic PS600E.

Have you gotten anywhere with this?

ChrisGurley
Enthusiast
Enthusiast

vonsch,

That's good to know about 4.1. Perhaps we never snapped large VMs prior to 5.0 and thus thought it began with it. Even so, it shouldn't take a minute per gig to write the memory to disk (IMO).

The latest from support is that this is by design and they are working (at a very low priority, it seems) to repro it and collaborate with engineering on it. It's not hard to repro (obviously) but I still haven't heard a technical reason for why the mem snap lasts that long. If I hear anything, I'll post it here.

--Chris

Reply
0 Kudos