1 2 3 Previous Next 30 Replies Latest reply on Jan 16, 2019 11:35 AM by ctoronto

    Slow Network Transfer Speeds between Guests

    ctoronto Novice

      I'm currently experiencing slow network transfer speeds between Guests in my environment.  Current speeds fluctuate, but i generally get anywhere from 11 to 65 MB/s, with the average being closer to 20 - 35 MB/s. All VMs are on the same subnet, and vlan.  To help in my testing I created a 6GB file using the fsutil command. Again, I tested back and forth with the same machines, and I still got the same results.  The same is true when sending this file between VMs on the same host.  Shouldn't I be seeing closer to full gigabit speed, especially when they are on the same host?

       

      My setup is as follows:

       

      • 2 ESX Hosts running 6.0
      • 20 VMs total  (2008 r2, win7, linux)
      • vCenter running 6.7
      • Gigabit switch physical switch with Link Aggregation.
      • iSCSI multi-path to Array (12 bay - 10 HD 7200rpm - 2 SSD as cache)

       

      Things I have tested:

       

      • Crystal Benchmark - 200 MB/s from VM with sequential.
      • Shutdown non-essential VMs, to ensure more available resources.  Speed test had the same results as before.
      • Moved VMs to local storage on ESX machines.  No difference
      • Tested NICs without load balancing to ensure no faulty physical configuration problem.   No difference
      • Tried vmxnet3 driver - No difference
      • Tested with FTP- No difference
      • Tested from VM to physical - Speed increase to as high as 100MB/s, but wasn't consistent

      Network Layout.png

       

       

        • 1. Re: Slow Network Transfer Speeds between Guests
          a.p. Guru
          Community WarriorsvExpertUser Moderators

          In most cases such issues are related to the storage rather than the network.

          To see whether the network throughput is as expected you should use a tool which transfers data without storage access, i.e. something like iperf, or NetIO.

           

          André

          • 2. Re: Slow Network Transfer Speeds between Guests
            ctoronto Novice

            I have tested the vms, moving them to local datastores, on each the ESX hosts, and there by bypassing the iscsi infrastructure.  When I did that, I did not see any noticeable change.  

             

            I have tested iperf, and appears to be getting about 950mbps.  This number goes up when they are connected on the same on the same host, ususally somewhere around 4.5 to 5 gbps.

            • 3. Re: Slow Network Transfer Speeds between Guests
              daphnissov Guru
              vExpertCommunity Warriors

              As André has pointed out, this is usually the cause of poor underlying storage performance, and your test results would seem to confirm this. Your local storage has different performance characteristics than this shared storage. So you should be looking at the shared storage to determine the cause. I will say that what looks immediately suspect to me are the 7.2K SATA drives which are horrible for virtual machines as they cannot do random writes well at all.

              • 4. Re: Slow Network Transfer Speeds between Guests
                ctoronto Novice

                What would be the best way to test this?  Wouldn't shutting down 3/4 of VM environment have help with determining this?  Also wouldn't moving VMs to different storage devices show different results?  As of thus far, no matter the change I still get the ruffly the same results.  Note: to supplement, the 7.2k drives, I have SSDs setup as cache. 

                 

                Thanks for your reply.

                • 5. Re: Slow Network Transfer Speeds between Guests
                  a.p. Guru
                  User ModeratorsCommunity WarriorsvExpert

                  What type/model of storage array do you use?

                  Does it use enterprise SSDs, and NL-SAS disks, or consumer SSD, and SATA disks?

                  SSDs are certainly fast, but it really depends on the storage controller, and or how the storage software uses them.

                   

                  André

                  • 6. Re: Slow Network Transfer Speeds between Guests
                    ctoronto Novice

                    Synology RS2416+

                     

                    I'm pretty sure that it can take a wide variety of drives, including sas, sata, and ssd.  With that said, I wouldn't call it an enterprise array.  Currently I have 10x 2TB WD Gold drives, running in RAID 6.  I have 2 SSDs that are setup as Read cache -- one for each

                     

                    I'm running 21 VMs,

                         4x - Windows Server

                         8x - Windows 7

                         6x - Linux Ubuntu

                         3x Virtual Appliances

                     

                    With that said, do you feel that I'm running at expected performance?    If so how can I tell?  My Array metrics seem quite low for usage (metrics are for one day).

                     

                         CPU           =  20% with a few small spikes up to %60

                         Network       1 = Management

                                             2 = Highest Received 30 MB/s / Highest Sent 23 MB/s -- Multipath

                                             3 = Highest Received 34 MB/s / Highest Sent 24 MB/s -- Multipath

                                             4 = HA

                         Disk            = Averages less that 10% but spikes occasionally 59% (highest)

                         iSCSI          IOPS = Average 100, Highest spike was small, but hit 1769

                                            Queue Depth = Average 0, highest spike was 5. 

                    • 7. Re: Slow Network Transfer Speeds between Guests
                      daphnissov Guru
                      Community WarriorsvExpert

                      Although RAID-6 with mechanical hard drives is terrible for performance (2x write penalty for double parity calculations), with 10 of them (SSD cache makes no difference because it's a read cache only) I'd still expect to be getting way more than that. Provide some more details about your Synology setup. You're using iSCSI? What type here? What version of DSM? How are your hosts connected to this storage? What's the networking topology in place here? I'd also grab I/O Analyzer and deploy it to run as a testbed.

                      • 8. Re: Slow Network Transfer Speeds between Guests
                        a.p. Guru
                        Community WarriorsUser ModeratorsvExpert

                        I really can't tell you what you can expect, but IMO you should at least have a better read performance with the SSDs as read-cache, unless you have lots of cache misses.

                        Anyway, did you configure your environment according to the vendor's recommendation (see Knowledge Base | Synology Inc)?

                        Especially the point where it comes to disabling DelayedAck may be important.

                        For how to disable this on a production system see e.g. https://kb.vmware.com/kb/1002598

                         

                        André

                        • 9. Re: Slow Network Transfer Speeds between Guests
                          ctoronto Novice

                          André

                           

                          The vendor's recommendation is almost identical to what I have.  The only difference being that mine also is employing multipath.   I haven't tried the DelayedAck, and sounds like it might be worth shoot.  I'll have to schedule an outage to test the disabling the delayedAck.

                           

                          Thanks,

                           

                          Chase

                          • 10. Re: Slow Network Transfer Speeds between Guests
                            ctoronto Novice

                            We are using file based iscsi -- as this is what was recommended by synology at the time of installation .  DSM version 6.2.1-23824 update 2.  Everything in connected via a single gigabit switch (soon upgrading to 2).  The switch is a 48 port hp 1920.   Currently there are multiple vLANS setup with 2 vlans dedicated for our iscsi connections (vlan2, vlan3). 

                             

                            We have 2 hosts that are identically configured.  Each has 4 physical adapter (2 of which dedicated for iscsi), the other 2 are for the guest connections.   The iscsi ports are setup so that no tagging is done on the host, but rather at the switch.  Each host is setup to us multipath down the 2 physical connections.

                             

                            On the array side, there are 4 NICS.  2 of which have been dedicated to iSCSI, one for management, and the other for HA heart beat. 

                             

                            I will get back to you later with my I/O tests. However, below I have attached the results of Crystal Benchmark with was run on 4 of our servers today. 

                             

                             

                            seq q32t1 readseq q32t1 write4k q32t1 read4k q32t1 readseq readseq write4k read 4k write
                            Server119596108171076383
                            Server2195140110221076283
                            Server318088100111035373
                            Server4653773583462
                            Server1194103107181076773
                            Server219210010431079682
                            Server318911610381035572
                            Server4604543573452
                            Average158.7590.62580.37510.62593.6255872.5

                             

                            Note there is something going on with server 4.  I will delve into fixing that after I have resolved the issue at had.

                             

                            Again, thanks for all your help.

                            • 11. Re: Slow Network Transfer Speeds between Guests
                              ctoronto Novice

                              Here are the results of my i/o tests. 

                               

                                      

                              Test PerformedWorkload SpecIOPSRead IOPSWrite IOPSMBPSRead MBPSWrite MBPS
                              Max Write IOPS 10min0.5k_0%Read_0%Random10623.29010623.295.1905.19
                              Max Write Throughput 10min0.5k_0%Read_0%Random13220.3013220.36.4606.46
                              Max Throughput 10min512k_100%Read_0%Random354.26354.260177.1177.130
                              Max IOPS 10min0.5k_100%Read_0%Random39135.739135.7019.1119.110

                               

                              One thing I noticed was that it seems odd was the MBPS.   It sees very low-- lower than some of the cifc tests that I had performed.

                              • 12. Re: Slow Network Transfer Speeds between Guests
                                daphnissov Guru
                                Community WarriorsvExpert

                                Wow, those write throughput statistics are horrible. No wonder you're noticing such bad performance. There is such a huge disparity in read vs write numbers because the read figures are being boosted by your SSD cache fronting your disk group. Here are some of the things I'd check and test:

                                 

                                1. I'd want to actually *see* how you have your virtual switches set up with regard to iSCSI and their associated vmkernel ports.
                                2. Your iSCSI portal on Synology is connected over L2 (from the vmkernel adapters), correct?
                                3. Do some network tests and look at response latencies from vmkernel to the iSCSI portal service. Check all adapters/uplinks. What does that look like?
                                4. Check network stats for these vmkernels/uplinks. Are there dropped packets?
                                5. The file-based LUN on Synology has, in my experience (I own 2 units in my lab) provided the worst performance at the cost of the most flexibility on the VMware side. This is just a trade-off you have to determine for yourself if it's worth. But provision a block LUN (multiple LUNs on RAID) and run some tests with I/O Analyzer to compare. Also compare to a NFS v3 export backed by the same Synology. What do the numbers look like when stacked against each other?
                                6. What does your CPU and Memory utilization on the NAS look like? File-based iSCSI LUN takes the most system resources.
                                7. Do you have any active snapshots on this LUN within Synology?

                                 

                                FYI, I'm leaving the country on vacation tomorrow and won't return for more than 2 weeks. I won't be able to respond during that time. Good luck.

                                • 13. Re: Slow Network Transfer Speeds between Guests
                                  ctoronto Novice

                                  1. Attached

                                  iscsi1.PNGvmkernel2.PNGvmkernes.PNGvmkernel3.PNGvmkernel4.PNG

                                  2. I'm not quite sure what you mean by portal. However, our iscsi traffic is dedicated on 2 of the synology ports.  They are turn connected a switch which manages all the tagging and untagging of iscsi traffic.  Likewise we have 2 on each host that are dedicated for iscsi traffic.  They are setup for mpio, and have no vlan configure on them.    All management of the NAS comes on a different port, that is purely dedicated for this function.  Port 4 is dedicated for HA.

                                  3. vmkping shows .135 to .134ms.  Jumbo frames are also working with the connection.  I'm not sure what other testing should   All adapters appear to be up and running with zero dropped packets.

                                  4. We have seen zero drops in packets on all interfaces.

                                  5. I talked to synology, and they said block level was removed removed from dsm 6.2.  As far as NFS goes, the reason I didn't want to use nfs, was because I was because I wouldn't be able to leverage both NICs on my ESX machines.  Our belief was that doing LACP was only allowed in you purchased vcenter enterprise plus, or higher.

                                  6. CPU averages around under 20 percent. Memory is at 29%  CPU spikes are see as high 50%.

                                  7. We do have snapshots, but none that are currently running.

                                  • 14. Re: Slow Network Transfer Speeds between Guests
                                    daphnissov Guru
                                    Community WarriorsvExpert

                                    Follow-ups:

                                     

                                    1. What version of ESXi?
                                    2. What type of hardware?
                                    3. Is this not the ESXi software iSCSI initiator you have here?
                                    4. When you say "vmkping shows .135 to .134ms" you do mean less than one millisecond, correct, and not one hundred thirty-five milliseconds?
                                    5. "I talked to synology, and they said block level was removed removed from dsm 6.2" <== I didn't know this. I'm still on 6.1 myself.
                                    6. As far as using NFS, yes, I understand that, but as it stands right now with your current performance numbers (on writes) you're nowhere near saturating a 1 GbE uplink. I would still recommend you try it on a single host as an experiment to compare the results. You don't have to delete your iSCSI configuration as long as the NFS export is on one of the same networks you have. By bypassing the iSCSI stack and running performance tests you can eliminate a complex variable in the equation.
                                    7. "We do have snapshots, but none that are currently running." <==What does "currently running" mean here? What I meant was does this iSCSI LUN on the Synology side have an open or active snapshot against it?
                                    1 2 3 Previous Next