Dear Experts,
I have an esx 4.0 machine setup and it receives a constant stream of syslog (udp) and tcp events.
I leverage snapshot for backup and restore purposes.
I came across the KB titled "Taking a snapshot with virtual machine memory stuns the virtual machine while the memory is written to disk".
The document mentions esx 3.x but the side bar has product version 4.0 and 4.1.
Question: Will I lose events during the snapshot? I'm less worried about the tcp events because those will be stored on a collector and resent in the event of downtime. The udp events will definitely be impacted and this is what I am trying to avoid. I did a quick test and while I ran snapshot I also ran a ping test. I only had 1 dropped packet.
Any information is greatly appreciated.
Thanks,
udp666
For backup purposes, you don't nee to worry as the memory is not snapshotted - is not useful to do that because only the disk matters.
Anyway, snapshotting disk and memory should not cause any problem to your network stack. The lost ping you see is expected - never had any problem at this point.
VM's still get briefly stunned during snapshot operations even if the VM's memory doesn't need to be written to disk.
Normally the stun's involved when snapshots are applied and removed/deleted are short in duration, say around a second or less, and are unlikely to mean missed UDP traffic. Though the length of these pauses will depend on your environment.
To get an idea of how long is normal for your environment you need to look in VM logs of similar VM machines (ie the vmware.log file in the same directory as your VM's vmx and vmdk files), there's normally one stun for creating/starting a snapshot, and numerous for deleting/ending one. The log file entries will look like (grep/search for Checkpoint_Unstun)...
May 27 10:47:59.931: vmx| Checkpoint_Unstun: vm stopped for 365815 us (ie 0.4 secs)
However they can, on rare occasion, last for longer (because the host, vm, or storage are busy, for example). I've seen up to 10 secs, could be worse if your infrastructure is really struggling. In which case you're going to drop syslog messages that would need to be handled at that time.