VMware Cloud Community
pgoggins
Enthusiast
Enthusiast

Hardware refresh and slow snapshots

This past week we completed the migration of our vSphere 5 environment over to new hardware and while most things are quite a bit faster snapshots are horribly slow now.

All nodes before and after the migration were running vSphere 5 Update 1.

Old hardware:
DL380 G5, dual x5450, 64GB ram, 2xquad-port gigabit nics + onboard gig (4x for iSCSI SAN, 4x for Data, and 2x for Service Console & vMotion)

New hardware:

DL380p Gen8, dual E5-2690, 256GB ram, dual-port 10GB LOM and dual-port 10GB add-on card (2x for iSCSI/Service Console/vMotion and 2x for DATA)

The SAN configuration is a HP Lefthand P4500 cluster which has not changed during this process other than the migration to a 9000 MTU.

The network path follows DL380---(2x10GB iSCSI)--->Switch--->SAN Cluster(2x1GB per node)

Snapshots as a whole use to take under 40 seconds to complete with the old hardware and now take 2 to 5 minutes....merging is taking about the same magnitude of time more during the action. The other oddity I've noticed is that during the time a snapshot is taken pings to the given VM jump from <1ms to somewhere between 200ms and 500ms.

----------------------------------------------------------- Blog @ http://www.liquidobject.com
Reply
0 Kudos
3 Replies
Gkeerthy
Expert
Expert

as you changed the MTU to 9000 did you verified that the PSWITCH and the storage is capable of handling this. Try to reduce the MTU and check the speed. And also setting higher MTU wont always give higher performance. the forwarding rate of the switch or the processing power of the switch needs to be increased exponentially.

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)
Reply
0 Kudos
vCloud9
Enthusiast
Enthusiast

I would first check and make sure that jumbo frames are enabled all the way from the storage switch (Physcial), Virtual Swtich, Kernel port, Storage. And also verify using vmkping -s <MTUSIZE> <IPADDRESS OF SAN> -d from the ESXi host . This VMware KB talks about all the steps involved in enabling Jumbo frames excluding the physical switch(storage), make sure you didnt miss any of the steps.

I would also take look at the Storage switch and make sure there are no issues.

Here is another VMware KB to help you identify any performance related issues related to storage using ESXTOP.

I just realized that your new hosts are equipped with 10GB NICs, Does your storage is also equipped with 10GB?

-vCloud9


Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful.
Reply
0 Kudos
pgoggins
Enthusiast
Enthusiast

Switch CPU fabric on the pair of 5400zl's peaks at 2% utilization

~ # vmkping -s 8972 10.0.0.247 -d
PING 10.0.0.247 (10.0.0.247): 8972 data bytes
8980 bytes from 10.0.0.247: icmp_seq=0 ttl=64 time=1.082 ms
8980 bytes from 10.0.0.247: icmp_seq=1 ttl=64 time=0.614 ms
8980 bytes from 10.0.0.247: icmp_seq=2 ttl=64 time=0.809 ms

Each of the P4500 nodes responded as the above did.

Current cluster is running 2x1GB ALB per node with 4 nodes at site A and 4 at site B in a multi-site cluster.

Checking via ESXTOP shows:

vmhba34 -                      22   407.43   172.60   202.33     1.83     1.07     1.35     0.01     1.36     0.00

CMDS/s peaked at 415 but usually stayed under 200.

After digging further with the Active/Active clustering of the P4500 cluster I needed to adjust the LUN I/O balancing beyond round robin via:

for i in `esxcli storage nmp device list | grep naa.600` ; do esxcli storage nmp psp roundrobin deviceconfig set -t iops –I 1 -d $i; done

Now the two iSCSI vmkernel ports are balancing more evenly. Peak CMDS/s is up to around 900 and normal average is hovering slightly above 200. There is still a delay but it a little bit better than before.

----------------------------------------------------------- Blog @ http://www.liquidobject.com
Reply
0 Kudos