VMware Cloud Community
udp666
Contributor
Contributor

Taking a Snapshot with constant traffic

Dear Experts,

I have an esx 4.0 machine setup and it receives a constant stream of syslog (udp) and tcp events.

I leverage snapshot for backup and restore purposes.

I came across the KB titled "Taking a snapshot with virtual machine memory stuns the virtual machine while the memory is written to disk".

The document mentions esx 3.x but the side bar has product version 4.0 and 4.1.

Question:  Will I lose events during the snapshot?  I'm less worried about the tcp events because those will be stored on a collector and resent in the event of downtime.  The udp events will definitely be impacted and this is what I am trying to avoid.  I did a quick test and while I ran snapshot I also ran a ping test.  I only had 1 dropped packet.

Any information is greatly appreciated.

Thanks,

udp666

0 Kudos
2 Replies
marcelo_soares
Champion
Champion

For backup purposes, you don't nee to worry as the memory is not snapshotted - is not useful to do that because only the disk matters.

Anyway, snapshotting disk and memory should not cause any problem to your network stack. The lost ping you see is expected - never had any problem at this point.

Marcelo Soares
0 Kudos
SimonStrutt
Enthusiast
Enthusiast

VM's still get briefly stunned during snapshot operations even if the VM's memory doesn't need to be written to disk.

Normally the stun's involved when snapshots are applied and removed/deleted are short in duration, say around a second or less, and are unlikely to mean missed UDP traffic.  Though the length of these pauses will depend on your environment.

To get an idea of how long is normal for your environment you need to  look in VM logs of similar VM machines (ie the vmware.log file in the same  directory as your VM's vmx and vmdk files), there's normally one stun  for creating/starting a snapshot, and numerous for deleting/ending one.   The log file entries will look like (grep/search for Checkpoint_Unstun)...

May 27 10:47:59.931: vmx| Checkpoint_Unstun: vm stopped for 365815 us  (ie 0.4 secs)

However they can, on rare occasion, last for longer (because the host, vm, or storage are busy, for example).  I've seen up to 10 secs, could be worse if your infrastructure is really struggling.  In which case you're going to drop syslog messages that would need to be handled at that time.

"The greatest challenge to any thinker is stating the problem in a way that will allow a solution." - Bertrand Russell
0 Kudos