4 Replies Latest reply on Aug 11, 2017 11:03 AM by aenagy

    Multicast performance test failed on vSAN 6.2

    aenagy Hot Shot

      I have setup a four node vSAN 6.2 cluster (vCenter 6.0 build 3617395). The Multicast test fails, but the other two tests pass. The hosts (ESXi 6.0.0 Update 3 build-5050593) are nested (on ESXi 6.0.0 Update 3 build-5050593). Health shows up all green except 'Hardware Compatibility' due to nesting. I have a dedicated VLAN and subnet and vmkernel interfaces exclusively for vSAN traffic. I have also added static routes for the multicast address on each host[1]:

       

      Network   Netmask         Gateway  Interface Source

      -------   -------         -------  --------- ------

      default   0.0.0.0         10.1.4.1 vmk0      MANUAL

      10.1.4.0  255.255.255.0   0.0.0.0  vmk0      MANUAL

      10.1.6.0  255.255.255.0   0.0.0.0  vmk1      MANUAL

      10.1.7.0  255.255.255.0   0.0.0.0  vmk2      MANUAL

      10.1.9.0  255.255.255.0   0.0.0.0  vmk4      MANUAL

      224.2.3.4 255.255.255.255 10.1.9.0 vmk4      MANUAL

       

      When I experiment with the switch (single Cisco SG300 in L3 mode) by enabling and disabling IGMP snooping and IGMP querier I get varying results -- see below. I have already posted a question to the Cisco forum to see what, if any, configuration changes could/should be done. As I look at the results below I see very high packet loss which I'm assuming is contributing to the test result being yellow or red.

       

      Any ideas or suggestions?

       

      Thanks.

       

      P.S. I read Why proactive multicast performance test fails when health check passes? , but this doesn't seem to apply.

       

       

      PowerCLI J:\Data\Projects\Programming\PowerShell\vSphere_networking\change_ESXi_static_routes> Write-Host "Start test with Snooping Enabled and Querier Enabled" ; $TestResult = Test-VsanNe

      tworkPerformance -Cluster ( Get-Cluster ) -Verbose ; $TestResult.HostResult | Format-Table -AutoSize ; Write-Host "End test with Snooping Enabled and Querier Enabled"

      Start test with Snooping Enabled and Querier Enabled

      VERBOSE: 2017-08-10 13:40:02 Test-VsanNetworkPerformance Started execution

      VERBOSE: 2017-08-10 13:40:46 Test-VsanNetworkPerformance Finished execution

      BandwidthBytesPerSecond IsClientInMulticast Host                       JitterMilliseconds LossPercentage LossDatagrams SentDatagrams Status TotalBytes

      ----------------------- ------------------- ----                       ------------------ -------------- ------------- ------------- ------ ----------

                    131072000                True topvcpesxi04.corp.ad.local                                                                       114022020

                     34207775               False topvcpesxi03.corp.ad.local 0.0260000005364418 10             7819          77565         yellow  102526620

                     26277255               False topvcpesxi02.corp.ad.local 0.0299999993294477 30             23970         77565         yellow   78784650

                     30538787               False topvcpesxi01.corp.ad.local 0.0260000005364418 19             15265         77565         yellow   91581000

      End test with Snooping Enabled and Querier Enabled

      ******************************************************************************************************************************************************

      PowerCLI J:\Data\Projects\Programming\PowerShell\vSphere_networking\change_ESXi_static_routes>

      PowerCLI J:\Data\Projects\Programming\PowerShell\vSphere_networking\change_ESXi_static_routes> Write-Host "Start test with Snooping Disabled and Querier Enabled" ; $TestResult = Test-VsanN

      etworkPerformance -Cluster ( Get-Cluster ) -Verbose ; $TestResult.HostResult | Format-Table -AutoSize ; Write-Host "End test with Snooping Disabled and Querier Enabled"

      Start test with Snooping Disabled and Querier Enabled

      VERBOSE: 2017-08-10 13:45:20 Test-VsanNetworkPerformance Started execution

      VERBOSE: 2017-08-10 13:46:03 Test-VsanNetworkPerformance Finished execution

      BandwidthBytesPerSecond IsClientInMulticast Host                       JitterMilliseconds LossPercentage LossDatagrams SentDatagrams Status TotalBytes

      ----------------------- ------------------- ----                       ------------------ -------------- ------------- ------------- ------ ----------

                    131072000                True topvcpesxi04.corp.ad.local                                                                        96023340

                     12020978               False topvcpesxi03.corp.ad.local 0.0430000014603138 62             40791         65321         red      36059100

                     22177906               False topvcpesxi02.corp.ad.local 0.0560000017285347 30             20052         65321         yellow   66545430

                     17085680               False topvcpesxi01.corp.ad.local 0.0500000007450581 46             30445         65321         red      51267720

      End test with Snooping Disabled and Querier Enabled

      ******************************************************************************************************************************************************

      PowerCLI J:\Data\Projects\Programming\PowerShell\vSphere_networking\change_ESXi_static_routes>

      PowerCLI J:\Data\Projects\Programming\PowerShell\vSphere_networking\change_ESXi_static_routes> Write-Host "Start test with Snooping Disabled and Querier Disabled" ; $TestResult = Test-Vsan

      NetworkPerformance -Cluster ( Get-Cluster ) -Verbose ; $TestResult.HostResult | Format-Table -AutoSize ; Write-Host "End test with Snooping Disabled and Querier Disabled"

      Start test with Snooping Disabled and Querier Disabled

      VERBOSE: 2017-08-10 13:46:37 Test-VsanNetworkPerformance Started execution

      VERBOSE: 2017-08-10 13:47:26 Test-VsanNetworkPerformance Finished execution

      BandwidthBytesPerSecond IsClientInMulticast Host                       JitterMilliseconds LossPercentage LossDatagrams SentDatagrams Status TotalBytes

      ----------------------- ------------------- ----                       ------------------ -------------- ------------- ------------- ------ ----------

                    131072000                True topvcpesxi04.corp.ad.local                                                                       140358540

                     26909635               False topvcpesxi03.corp.ad.local 0.131999999284744  42             40548         95481         yellow   80751510

                     29760610               False topvcpesxi02.corp.ad.local 0.130999997258186  36             34722         95481         yellow   89315730

                     27572945               False topvcpesxi01.corp.ad.local 0.131999999284744  41             39191         95481         yellow   82746300

      End test with Snooping Disabled and Querier Disabled

      ******************************************************************************************************************************************************

      PowerCLI J:\Data\Projects\Programming\PowerShell\vSphere_networking\change_ESXi_static_routes>

      PowerCLI J:\Data\Projects\Programming\PowerShell\vSphere_networking\change_ESXi_static_routes> Write-Host "Start test with Snooping Enabled and Querier Disabled" ; $TestResult = Test-VsanN

      etworkPerformance -Cluster ( Get-Cluster ) -Verbose ; $TestResult.HostResult | Format-Table -AutoSize ; Write-Host "End test with Snooping Enabled and Querier Disabled"

      Start test with Snooping Enabled and Querier Disabled

      VERBOSE: 2017-08-10 13:48:14 Test-VsanNetworkPerformance Started execution

      VERBOSE: 2017-08-10 13:48:57 Test-VsanNetworkPerformance Finished execution

      BandwidthBytesPerSecond IsClientInMulticast Host                       JitterMilliseconds LossPercentage LossDatagrams SentDatagrams Status TotalBytes

      ----------------------- ------------------- ----                       ------------------ -------------- ------------- ------------- ------ ----------

                    131072000                True topvcpesxi04.corp.ad.local                                                                        93794820

                     15191524               False topvcpesxi03.corp.ad.local 0.0560000017285347 11             8994          77506         red     100712640

                     14165799               False topvcpesxi02.corp.ad.local 0.0549999997019768 17             13617         77506         red      93916830

                     14672569               False topvcpesxi01.corp.ad.local 0.0710000023245811 14             11334         77506         red      97272840

      End test with Snooping Enabled and Querier Disabled

      PowerCLI J:\Data\Projects\Programming\PowerShell\vSphere_networking\change_ESXi_static_routes>

       

       

      [1] vSAN Multicast performance test fails (2135495) | VMware KB

      ... which lead me to section "Multicast performance test of Virtual SAN health check does not run on Virtual SAN network " ...

      VMware Virtual SAN 6.1 Release Notes

        • 1. Re: Multicast performance test failed on vSAN 6.2
          TheBobkin Virtuoso
          vExpertVMware Employees

          Hello aenagy,

           

           

          It wouldn't be surprising to see Health check tests failing in a nested environment - they are not designed for this structure and thus the tests 'pass' or 'fail' thresholds are more likely to be reached.

           

          Is cluster formation and communication functional and the cluster stable?

          If so then Multicast traffic is getting passed and adequate.

          Check cluster membership revision count in #esxcli vsan cluster get, if it is not incrementing and cluster is formed then it is probably fine, check that you can make VMs with a Storage Policy applied etc. .

           

          I recall the other issue you referenced, I think that one was actually an issue with broken Health test and thus that kb walks through the method of spoofing it so it can be 'all-green' again, I don't think the results of the tests you showed are indicative of this as you got some warning (yellow) results, not just failed (red).

           

           

          Bob

          • 2. Re: Multicast performance test failed on vSAN 6.2
            aenagy Hot Shot

            Bob:

             

            It wouldn't be surprising to see Health check tests failing in a nested environment - they are not designed for this structure and thus the tests 'pass' or 'fail' thresholds are more likely to be reached.

            Given that there is hardly any load and its such a small environment I'm still a bit surprised. I checked esxtop on the physical host and see what was happening with the network traffic and I see that the ports for the nested hosts have high %DRPRX. I am using a modified William Lam's virtual ESXi appliance in conjunction with the mac learning filter, but this doesn't seem to do be working the way it should. The NFS and iSCSI on the same pHost seem to work fine and the vSAN seems OK as near as I can tell.

             

            Is cluster formation and communication functional and the cluster stable?

            vSAN health looks OK other than "Data > Virtual SAN object health" shows up with a white "i" in a blue circle rather than green/yellow/red icon.

             

            Check cluster membership revision count in #esxcli vsan cluster get, if it is not incrementing and cluster is formed then it is probably fine, check that you can make VMs with a Storage Policy applied etc.

            Output from "esxcli vsan cluster get" seems OK (Health State: HEALTHY, Membership Entry Revision and Member count on all four nodes have the same value). I have been able to twice clone a virtual machine to the default vSAN policy and then one vm to a FTT-1-Raid5 and the other vm to a FTT=1-Raid1 policy. The initial cloning for a 16 GB vm took more than an hour which needless to say is alarming considering the vSAN cluster is all SSD. ESXTOP tells the story:

             

            Nested ESXi03:

             

            12:56:42am up 15:22, 546 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.04, 0.04, 0.03

             

            ADAPTR PATH                 NPTH   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd

            vmhba0 -                       0     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

            vmhba1 -                       4   113.55    73.81    39.74     0.29     2.48   394.55     4.21   398.75     6.28

            vmhba32 -                       1     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

            vmhba33 -                       6     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

             

            Nested ESXi04:

             

            12:55:44am up 15:09, 544 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.04, 0.03, 0.03

             

            ADAPTR PATH                 NPTH   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd

            vmhba0 -                       0     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

            vmhba1 -                       4    79.35     0.00    79.35     0.00     0.31   328.38     0.01   328.39     0.00

            vmhba32 -                       1     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

            vmhba33 -                       6     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

             

            Obviously there is something seriously wrong. Even though this is a consumer grade SSD (Samsung 850 in a HP Proliant DL360 Gen8) I wouldn't expect some this this bad.

             

            I recall the other issue you referenced, I think that one was actually an issue with broken Health test and thus that kb walks through the method of spoofing it so it can be 'all-green' again, I don't think the results of the tests you showed are indicative of this as you got some warning (yellow) results, not just failed (red).

            Grasping at straws...

             

            I might have been expecting too much out of nested ESXi, but this really takes the cake. I did the HOL-1708-SDC-1 lab recently and don't remember it being this bad. Mind you the virtual machine was tiny (500 MB or so). That being said I just want to make sure that I haven't missed anything obvious.

            • 3. Re: Multicast performance test failed on vSAN 6.2
              TheBobkin Virtuoso
              vExpertVMware Employees

              Hello,

               

               

              That is very high DAVG and I wouldn't expect much performance with that slow a response time.

              I am unaware of whether this is expected with nested home-labs (I just use HOL or physical labs) so maybe a home-labber can weigh in here.

               

              Do you have seperate SSDs partitioned backing the cache and capacity tier drives?

              What driver and version used for the local disk connection on the host?

               

               

              Bob

              • 4. Re: Multicast performance test failed on vSAN 6.2
                aenagy Hot Shot

                That is very high DAVG and I wouldn't expect much performance with that slow a response time.

                Yeah. This is worse than the spinning disks (3 x HP SAS 10k, RAID5) by orders of magnitude.

                 

                Do you have seperate SSDs partitioned backing the cache and capacity tier drives?

                All of the nested ESXi hosts and their VMDKs are provisioned on the same VMFS datastore on the Samsung 850 EVO.

                 

                What driver and version used for the local disk connection on the host?

                I'm using HP's OEM ESXi:

                 

                [root@TOPELHhost01:~] esxcfg-scsidevs -a

                vmhba1  hpvsa             link-n/a  sata.vmhba1                             (0000:00:1f.2) Intel Corporation HP Dynamic Smart Array B120i RAID controller

                vmhba2  hpsa              link-n/a  sas.50014380278cf830                    (0000:03:00.0) Hewlett Packard Enterprise Smart Array P420

                [root@TOPELHhost01:~] vmkchdev -l | grep -i -e vmhba2

                0000:03:00.0 103c:323b 103c:3351 vmkernel vmhba2

                [root@TOPELHhost01:~] vmkload_mod -s hpsa

                vmkload_mod module information

                input file: /usr/lib/vmware/vmkmod/hpsa

                License: GPL

                Version: Version 6.0.0.124-1OEM, Build: 2494585, Interface: 9.2 Built on: Nov 16 2016

                Build Type: release

                Required name-spaces:

                  com.vmware.driverAPI#9.2.3.0

                  com.vmware.vmkapi#v2_3_0_0

                Parameters:

                  heap_max: int

                    Maximum attainable heap size for the driver.

                  heap_initial: int

                    Initial heap size allocated for the driver.

                  reply_queues: int

                    Specify desired number of reply queues. 1-16, default is 4, not to exceed number of online CPUs.

                  hpsa_simple_mode: int

                    Use 'simple mode' rather than 'performant mode'

                  hpsa_allow_any: int

                    Allow hpsa driver to access unknown HPE Smart Array hardware

                [root@TOPELHhost01:~] vmware -v

                VMware ESXi 6.0.0 build-5050593

                [root@TOPELHhost01:~]