11 Replies Latest reply on Jan 29, 2019 1:01 PM by Darking

    "All hosts contribution stats" warning

    Wolfman017 Lurker

      Hi,

       

      I have a stretched cluster of 2 ESXi 6.5u2 (SERVER1 and SERVER2) + 1 Witness (WITNESS1).

      When I check the vSAN health screen everything is OK except one which is in Warning : "All hosts contributing stats". The warning says that the host SERVER2 is not contributing to stats.

       

      I followed this tutorial to determine which server was the stats master : vSAN hosts not contributing stats reports - vSAN Health check fails - vhabit

      My stats Master is SERVER1

       

      If I check the vsanmgmt.log file, I have this warning :

      2019-01-09T14:21:49Z VSANMGMTSVC: WARNING vsanperfsvc[Collector-1] [statscollector::RetrieveRemoteStats] Error happened during RetrieveRemoteStats of host IP_of_SERVER2, type: <class 'socket.timeout'>, message: timed out

      2019-01-09T14:21:50Z VSANMGMTSVC: WARNING vsanperfsvc[Collector-1] [statscollector::RetrieveRemoteStats] Error happened during RetrieveRemoteStats of host IP_of_WITNESS1, type: <class 'socket.timeout'>, message: timed out

       

       

      Communications between servers work well. I checked it with vmkping. The rest of the vSAN works perfectly.

       

      Any idea ?

        • 1. Re: "All hosts contribution stats" warning
          TheBobkin Virtuoso
          VMware EmployeesvExpert

          Hello Wolfman017,

           

           

          First step would be to restart vsanmgmtd and confirm that this is functional:

          # ps | grep vsan

          #/etc/init.d/vsanmgmtd stop

          Check that the processes are gone (in 6.7 there will be other 'vsan' named processes but don't mind these):

          # ps | grep vsan

          #/etc/init.d/vsanmgmtd start

          When you are testing ping are you using vmkping and specifying what interface to go over? (-i vmkX)

           

           

          Bob

          • 2. Re: "All hosts contribution stats" warning
            Wolfman017 Lurker

            Hi,

             

            I already stopped/restarted the service while activating the debug mode. I even rebooted both nodes of the cluster, and it keeps doing the same.

             

            When doing the vmkping, I specify the vSAN VMKernel port.I did not specify it, but the IP addresses I talk about (when testing vmkping and in the log) are the vSAN IP, not the management IP.

            • 3. Re: "All hosts contribution stats" warning
              Darking Novice

              Aloha!

               

              I would try to look into the log with the debug settings, in case you havnt.

               

              Here is the settings you need to make on the MASTER node:

               

              VMware Knowledge Base

               

               

              I have a very similar case running with support at the moment, in our 8 node stretched cluster, and my debug events look like this:

               

              2019-01-10T12:13:41Z VSANMGMTSVC: DEBUG vsanperfsvc[Collector-Main] [statscollector::_FetchAndCalculateStats] waiting for stats readiness

              2019-01-10T12:13:44Z VSANMGMTSVC: WARNING vsanperfsvc[Collector-2] [statscollector::RetrieveRemoteStats] Error happened during RetrieveRemoteStats of host 172.29.242.13, type: <class 'OSError'>, message: [Errno 113] No route to host

               

              For some very odd reason it is trying to use the interface that is running dedicated witness traffic. which only has traffic allowed to the witness site, and not intersite.

               

              I would assume it would either use the vmware management network or the VSAN network.. but nope.. mine is trying the witness traffic network.

               

              when i hear back from support i'll post it here.

              • 4. Re: "All hosts contribution stats" warning
                Wolfman017 Lurker

                The servers communicate on the vSAN network. What do you call "witness network" ? The "witnessPg" port group ?

                         

                 

                Here is the log with the debug activated :

                 

                2019-01-10T13:54:17Z VSANMGMTSVC: WARNING vsanperfsvc[Collector-1] [statscollector::RetrieveRemoteStats] Error happened during RetrieveRemoteStats of host VSAN_IP_of_SERVER2, type: <class 'socket.timeout'>, message: timed out

                2019-01-10T13:54:17Z VSANMGMTSVC: DEBUG vsanperfsvc[Collector-1] [statscollector::RetrieveRemoteStats] Traceback (most recent call last):    File "/build/mts/release/bora-10642691/bora/build/vsan/release/vsanhealth/usr/lib/vmware/vsan/perfsvc/statscollector.py", line 676, in RetrieveRemoteStats    File "/build/mts/release/bora-10719125/bora/build/esx/release/vmvisor/sys/lib64/python3.5/site-packages/pyVmomi/VmomiSupport.py", line 557, in <lambda>    File "/build/mts/release/bora-10719125/bora/build/esx/release/vmvisor/sys/lib64/python3.5/site-packages/pyVmomi/VmomiSupport.py", line 363, in _InvokeMethod    File "/build/mts/release/bora-10719125/bora/build/esx/release/vmvisor/sys/lib64/python3.5/site-packages/pyVmomi/SoapAdapter.py", line 1303, in InvokeMethod    File "/build/mts/release/bora-10719125/bora/build/esx/release/vmvisor/sys/lib64/python3.5/site-packages/pyVmomi/SoapAdapter.py", line 1369, in GetConne

                2019-01-10T13:54:17Z VSANMGMTSVC: WARNING vsanperfsvc[Collector-2] [statscollector::RetrieveRemoteStats] Error happened during RetrieveRemoteStats of host VSAN_IP_of_WITNESS, type: <class 'socket.timeout'>, message: timed out

                2019-01-10T13:54:17Z VSANMGMTSVC: DEBUG vsanperfsvc[Collector-2] [statscollector::RetrieveRemoteStats] Traceback (most recent call last):    File "/build/mts/release/bora-10642691/bora/build/vsan/release/vsanhealth/usr/lib/vmware/vsan/perfsvc/statscollector.py", line 676, in RetrieveRemoteStats    File "/build/mts/release/bora-10719125/bora/build/esx/release/vmvisor/sys/lib64/python3.5/site-packages/pyVmomi/VmomiSupport.py", line 557, in <lambda>    File "/build/mts/release/bora-10719125/bora/build/esx/release/vmvisor/sys/lib64/python3.5/site-packages/pyVmomi/VmomiSupport.py", line 363, in _InvokeMethod    File "/build/mts/release/bora-10719125/bora/build/esx/release/vmvisor/sys/lib64/python3.5/site-packages/pyVmomi/SoapAdapter.py", line 1303, in InvokeMethod    File "/build/mts/release/bora-10719125/bora/build/esx/release/vmvisor/sys/lib64/python3.5/site-packages/pyVmomi/SoapAdapter.py", line 1369, in GetConne

                • 5. Re: "All hosts contribution stats" warning
                  Darking Novice

                  Im referring to the witness traffic LAN.

                   

                  Its a new function in 6.7 U1 im running, i forgot you are not running this release. sorry about that.

                   

                  in 6.5 the VSAN network is used both for witness and VSAN traffic.

                   

                   

                  have you tried a vmkping?

                   

                  vmkping -I <interface of VSAN> destination-of-other-host

                   

                  else i would check if a firewall has been setup that is blocking (both some sort of physical firewall or on the esxi hosts)

                  • 6. Re: "All hosts contribution stats" warning
                    Darking Novice

                    Hi wolfman017

                     

                    any Update on your issue?

                     

                    i created a case a week ago with gss, and unfortunately I have not yet received a analysis or resolution on the problem.

                     

                    they are telling me they will report in tomorrow but level 2 tech and his senior were stumped :/

                    • 7. Re: "All hosts contribution stats" warning
                      Wolfman017 Lurker

                      Hi,

                       

                      No, no news. I sent the issue to our TAM, and it is not a known issue.

                       

                      We have another vSAN Cluster on the same vCenter, and I have the same issue.

                      I just build another vSAN Cluster (exactly the same as the one with the issue) on another vCenter, and I don't have the issue.

                      • 8. Re: "All hosts contribution stats" warning
                        Beingnsxpaddy Enthusiast
                        vExpert

                        Did you try restarting the management services on the ESXi Host, and can you add a new host in same cluster to see the behaviour.

                        • 9. Re: "All hosts contribution stats" warning
                          TheBobkin Virtuoso
                          VMware EmployeesvExpert

                          Hello Wolfman017 / Darking

                           

                           

                          Do either of you by any chance have port 80 traffic blocked between the hosts and/or between hosts and vCenter?

                          I ask as this only got added to our required ports documentation for this service fairly recently:

                          vSAN Network Port Requirements 

                           

                           

                          Bob

                          1 person found this helpful
                          • 10. Re: "All hosts contribution stats" warning
                            Darking Novice

                            Hi TheBobkin

                             

                            We are still talking with GSS and got a PR made.

                             

                            it seems to be some network configuration issue in our end, but it took a while for gss to figure out what was going on

                             

                            Now we see there is an article on the storagehub that describes the setup in more detail, and we are in progress of making this change:

                            I would swear that it didnt exist when we started the setup, but who knows!

                            Somehow we got it into our heads that WTS was supposed to be seperated out completely for each site, aka no routes between Primary and Secondary Site.. Now we see the design schematics that we were wrong

                             

                            I'll get back tomorrow when we reconfigured the routes for the witness site, and created a single vlan for the witness traffic between the primary and secondary site.

                            • 11. Re: "All hosts contribution stats" warning
                              Darking Novice

                              Just a small update on our issue.

                               

                              We seem to have it resolved now by establishing a route between the two sites for the WTS traffic.

                               

                              A all in all logical solution, but we did not find the entire process of setting up Witness Traffic Seperation very well documented when we did the initial setup. We will do extensive failover site testing tomrrow and hopefully everything will work as expected.