VMware Cloud Community
jkoebrunner
Enthusiast
Enthusiast

VMware Snapshot Performance & VSAN -> Losing network connectivity

UPDATE:

We have narrowed it down: it's somehow connected to snapshots and VSAN.

Network outage only happens on VSAN datatstore and not on NetApp NFS datastore.

Any ideas?

Hey,

We have problems with VMware snapshot technology since during snapshot creation and deletion, the network connectivity is lost temporarily.

Test cases:

- Snapshot Creation and Removal (with Quiescing) on a W2012R2 VM running on ESXi/VSAN 5.5 (2718055):

     - PING the VM with HRPING every 250ms

     - We lose about 54 packets ~ 13 seconds during creation

     - We lose about 35 packets ~ 9 seconds during deletion

- Snapshot Creation and Removal (without Quiescing) on a W2012R2 VM running on ESXi/VSAN 5.5 (2718055):

     - PING the VM with HRPING every 250ms

     - We lose about 10 packets ~ 2 seconds during creation

     - We lose about 14 packets ~ 3 seconds during deletion

Can anybody explain this behaviour? Is there any way to optimize snapshots?

What I see is that quiescing causes a longer outage than without.... why?

I would assume that with quiescing, the filesystem gets stunned but not the OS at all.

Losing the network connectivity is somehow unexpected and not acceptable for real time applications.

Grateful for any further ideas on this...

Johannes Köbrunner IT Solutions Architect Virtualization, Network and Storage Systems Frequentis AG VTSP, VCP, VCAP-DCD
Reply
0 Kudos
7 Replies
zdickinson
Expert
Expert

Do I understand that if you snapshot a VM running on vSAN you lose connectivity until the snapshot is taken?  Then when you delete a snapshot you lose connectivity until the snapshot is removed?  But you don't see this behavior from a VM on your NetApp storage?

What is the vSAN storage networking setup?

Do the snapshot operations take significantly longer on vSAN?

Are the vSAN hosts also the hosts connected to NetApp?  What I'm wondering is if vSAN is OK, but there is North-South networking issues from wherever you pinging to the VM.  Can you have two VMs running on the same host.  Setup a ping from one to the other and then take a snapshot.  If you don't lose pings I would look at that North-South traffic.  Thank you, Zach.

Reply
0 Kudos
crosdorff
Enthusiast
Enthusiast

Vsan 5.5 and snapshots are problematic.

We have same problems with an MS SQL cluster here where timeouts hapen duren snapshot creation and removal for Veeam.

The whole snapshot system should be much better in Vsan 6.0.

Reply
0 Kudos
jkoebrunner
Enthusiast
Enthusiast

Yes your understanding is correct.

I have retested the scenario: both VMs are on the same VSAN host, PING is lost for about 10 secs during snapshot creation.

From my point of view, it is not related to network issues because we have experienced this behaviour in many customer projects in combination with VDP backup doing snapshots.

The NetApp system is a different system, I just wanted to have a comparison between snapshotting a VM on NetApp and a VM on VSAN.

I know that VSAN 6.0 will provide performance snapshots, nevertheless we have VSAN 1.0 in place for many customers where we will not upgrade but want to use VDP for backup.

Johannes Köbrunner IT Solutions Architect Virtualization, Network and Storage Systems Frequentis AG VTSP, VCP, VCAP-DCD
Reply
0 Kudos
zdickinson
Expert
Expert

I have not seen this behavior, but do agree that 6.0 should provide much better snapshots.  Thank you, Zach.

Reply
0 Kudos
jkoebrunner
Enthusiast
Enthusiast

Unfortunately this is not the answer which really helps us out of this problem :smileysilly:

Next step will be to open a support request.

Johannes Köbrunner IT Solutions Architect Virtualization, Network and Storage Systems Frequentis AG VTSP, VCP, VCAP-DCD
Reply
0 Kudos
jkoebrunner
Enthusiast
Enthusiast

Update: This issue exists also with vSphere 6.0 / VSAN 6.0, although it got better with the new release.

I wonder that nobody else is experiencing a short outage on the network while creating and deleting snapshots for Windows 2012R2 VMs (with quiescing and without)?

Johannes Köbrunner IT Solutions Architect Virtualization, Network and Storage Systems Frequentis AG VTSP, VCP, VCAP-DCD
Reply
0 Kudos
justinbennett
Enthusiast
Enthusiast

Using Veeam with our VSAN on vSphere 6.0 and we are also seeing slow snapshot creation. We loose one to a few packets during creation depending on the size of the VM Memory and the hosts utilization.

Here's some of Veeam's tips: http://www.veeam.com/kb1681

  • Check the VM for snapshots while no job is running and remove any that are found.
  • Check for orphaned snapshots on the VM. (See: http://kb.vmware.com/kb/1005049)
  • Reduce the number of concurrent tasks that are occurring within Veeam, this will in turn reduce the number of active snapshot tasks on the datastores.
  • Move VM to a datastore with more available IOPS, or split the disks of the VM up into multiple datastores to more evenly spread the load.
  • If the VMs CPU resources spike heavily during Snapshot consolidation, consider increasing the CPU reservation for that VM.
  • Ensure you are on the latest build of your current version of vSphere, hypervisors, VMware Tools and SAN firmware when applicable.
  • Move VM to a host with more available resources.
  • If possible, change the time of day that the VM gets backed up or replicated to a time when the least storage activity occurs.
  • Use a workingDir to redirect Snapshots to a different datastore than the one the VM resides on. http://kb.vmware.com/kb/1002929
  • Disable VMware Tools Sync driver on the VM: http://kb.vmware.com/kb/1009886
Reply
0 Kudos