VMware Cloud Community
admin
Immortal
Immortal

Are Our Server Network Configuration Holding Us Back?

Hi All,


I am trying to determine whether increasing the number of network ports in our environment will make any difference to performance. Either for VM network traffic or iSCSI VMFS data stores. I do not think we currently are experiencing any bottlenecks, but sometimes I do not think our environment is running as efficiently it should.


Here is a bit of detail. We have 12 hosts, two running 288gb and 2 x 2.93ghz.  Another 10 hosts running 144gb and 2.40ghz. All of our servers are equipped with 14 NIC ports. In the past when we were running 19 hosts (72 gb) we scaled back our port density to where currently we are running 5 ports per server. 2 x VM network + SC, 2 x SAN VMFS + VM iSCSI, 1 x vMotion.  We are running 480 VM’s in this environment. Some people would look at that configuration and say it is no good, it needs to be 2 x ports per service (2 x SC, 2 x vMotion, 2 x iscsi, 2 x SAN VMFS, 2 x VM network, etc.) That’s what we used to run and we were not even hitting 10% utilization on our links. So I scaled back to reduce the cabling. I cant say I have noticed a real performance change. However, this was two years ago.

We are implementing 6 new servers each with 384gb. I am thinking it might be worth expanding the network ports up to 4 for SAN and 4 for VM traffic. I am trying to figure out how to determine whether our hosts network configuration is actually a bottleneck and whether it will be advantageous. What I mean is, with our current configuration I am seeing at most 15% utilization across the network interfaces with just two nics for SAN/iSCSI and two for VM traffic. However what I am unsure of, is whether those two links are introducing any kind of latency.

I know that for the most part, more links = less chance of congestion and better throughput, but what I am trying to figure out is whether my two links per service (2 x SAN vmfs+iSCSI, 2 x vm traffic) are imposing performance limitations on us currently. I am not entirely sure what network performance metrics I should be looking at to indicate whether adding additional ports would help.

Our back-end is Equallogic (5 arrays, 2 6000xv’s in a pool and 3 x ps6500x in a pool). Our switches are 3750’s with stacking – isolated iscsi network. All 1gpbs, all jumbo frames, flow control enabled.

Any pointers?

Thanks

Reply
0 Kudos
4 Replies
HeathReynolds
Enthusiast
Enthusiast

If you dedicate more links to vMotion and turn on multiadapter vMotion you will see a big difference in your vMotion performance.

Everything else just depends on how busy your enviroment is...

It wouldn't be a bad idea to split your VMFS from your VM ISCSI.

Do you have two 3750 stacks, or are you running a single stack?

My sometimes relevant blog on data center networking and virtualization : http://www.heathreynolds.com
Reply
0 Kudos
Josh26
Virtuoso
Virtuoso

Heath Reynolds wrote:

If you dedicate more links to vMotion and turn on multiadapter vMotion you will see a big difference in your vMotion performance.

In a realistic world, how often does anyone ever say "the performance of this vMotion is a problem"? I haven't used multiadapter vmotion outside the lab, because even the biggest VMs still seem to migrate in just a few seconds.

Regarding the original query, if in doubt, all you can do is look at utilisation.

It's incredibly common in the physical world to have all your servers connected to one switch, then have one or two uplink ports (so 2GB max) actually connecting servers anywhere.

Then suddenly, virtualization came around, and there seems to be this trend of throwing 15 pNICs at a host based on some view that it's needed for performance. It's not unusual for me to see a client with four VMs, and six pNICs. All connected to one switch.

I've even seen a 100Mbit switch used for iSCSI, with the associated IT manager explaining he used eight NICs for performance. Or the situation where someone had 4 x gigabit pNICs, wanted an upgrade to 10GbE, and decided they would need 4 x 10GbE to see performance.

I'm not saying it won't be a problem, I'm saying, don't assume it is.

HeathReynolds
Enthusiast
Enthusiast

Josh26 wrote:

Heath Reynolds wrote:

If you dedicate more links to vMotion and turn on multiadapter vMotion you will see a big difference in your vMotion performance.

In a realistic world, how often does anyone ever say "the performance of this vMotion is a problem"? I haven't used multiadapter vmotion outside the lab, because even the biggest VMs still seem to migrate in just a few seconds.

With larger VMs under heavy load we start running into issues using a single gig nic. 32G guests under a heavy load are where we start running into issues, so the same server that you couldn't vmotion in 4.1 because the memory pages change faster than the delta can copy. In 5.0 stun during page send really improved the reliability of vMotion for larger guests, but it has caused some issues for us with some workloads as it puts in processor delay. We could correleate SAP batch jobs failing with vMotion events, running multi-adapter fixed it since we had enough bandwidth for the delta to copy without SDPS introducing serious delay. 32GB is the largest guest we will run in our gig nic enviroment, we run up to 128GB in our 10G eviroment.

It also greatly improves the amount of time it takes to put a host in mait mode. Running multiadapter vMotion on our big hosts with four 10G CNA's and 1TB of ram we can clear a host out for mait mode in about 8 minutes.

My sometimes relevant blog on data center networking and virtualization : http://www.heathreynolds.com
admin
Immortal
Immortal

Thanks guys. I really dont think we are experiencing a bottleneck. But as mentioned, sometimes I dont think we see as good as vmfs performance as we should. Also, sometimes I wonder if our guest throughput on some of our larger file servers is actually operating to its fullest potential.

We run our four 3750's SAN/VMFS switches in a stack - so the traffic is traversing the 36gb backplane from SAN connected 3750's to host connected 3750's.

Reply
0 Kudos