vSAN1

 View Only
Expand all | Collapse all

VSAN Performance service issue

  • 1.  VSAN Performance service issue

    Posted Oct 26, 2018 11:18 AM

    Hello VSAN community

    I have strange issue  on one my customer production environment, since last week, i git Red error on VSAN health Performance services" State Master election Failed"  

    when I try to turn off the Vsan performance services also   it shows   in task name  "the object or item referred to could not be found" 

    i update all ESX servers from 6.5 U1 to   6.5 U2 with the latest patch and also Vcenter is 6.5 U2  , but still same error

    I used the below commands for fix the issue but not working

    On each host in the cluster from a ssh session run:   

        /etc/init.d/vsanmgmtd restart   

        /etc/init.d/vsanvpd restart   

           

    ON vCenter through an ssh session restart VSAN Health:   

        service-control --stop vmware-vsan-health   

        service-control --start vmware-vsan-health  

        service-control --stop vmware-sps   

        service-control --start vmware-sps    

           

    I really don't know what to do, is any recommendation  or help from you guys? 

    Thanks

    Alex



  • 2.  RE: VSAN Performance service issue

    Posted Oct 26, 2018 02:51 PM

    Hello Alex,

    What I find strange there is the seemingly conflicting information - the 'Stats DB Object' check is green (which includes the Object health check) but the Object is not found when you try to disable the service. Can you check that the Proactive VM creation test passes just in case you are having a VASA/SPBM issue here?:

    Cluster > Monitor > vSAN > Proactive Tests > VM Creation Test

    If this is okay, I would advise starting with manually checking the state, component locations and Owner of this Object via RVC:

    > vsan.perf.stats_object_info  <PathToCluster>

    The Object that the Performance service writes to (Stats DB Object) can also be removed and recreated via RVC (you can see all the options by typing vsan.perf.stats_object_ TAB TAB).

    Bob



  • 3.  RE: VSAN Performance service issue

    Posted Oct 26, 2018 03:25 PM

    Hello Bob

    Thanks for your reply , actually  VM creation working fine  and based on you advise i login for RVC,

    checking performance state not showing me the error, but when i try to delete it with vsan.perf.stats_object_delete it show me error :

    /computers> vsan.perf.stats_object_delete 1

    Deleting vSAN Stats DB object, which will stop vSAN Performance Service ...

    Task: Disable vSAN performance service

    New progress: 1%

    Task result: error



  • 4.  RE: VSAN Performance service issue

    Posted Oct 26, 2018 04:08 PM

    Hello Alex,

    And what output do you see when you run vsan.perf.stats_object_info?

    If it still exists and is unhealthy it may be possible to manually delete the Stats DB Object using Objtool (obviously make sure you are deleting the correct Object) or remediate it otherwise via abdication.

    If you SSH to one of the hosts and go down to the vsanDatastore (/vmfs/volumes/vsanDatastore with default name) and use ls -lah, do you see a .vsan.stats symlink? And if so, can you cd into the namespace it points to?

    What does the state of the components look like when you query the UUID of the namespace with # cmmds-tool find -t DOM_OBJECT -f json -u <UUID> (and/or using vsan.object_info <PathToCluster> <UUID> )?

    Bob



  • 5.  RE: VSAN Performance service issue

    Posted Oct 26, 2018 04:16 PM

    Hello Bob

    Here is the vsan.perf.stats_object_info

    > vsan.perf.stats_object_info 1

    Directory Name: .vsan.stats

    vSAN Object UUID: 1127545b-6385-8e8f-e1de-ecb1d7afa240

    SPBM Profile: vSAN Default Storage Policy

    vSAN Policy: cacheReservation: 0, checksumDisabled: 0, spbmProfileName: vSAN Default Storage Policy, spbmProfileId: aa6d5a82-1c88-45da-85d3-3d74b91a5bad, CSN: 166, proportionalCapacity: 0, spbmProfileGenerationNumber: 2, stripeWidth: 1, forceProvisioning: 0, SCSN: 163, hostFailuresToTolerate: 1

    vSAN Object Health: healthy

    DOM Object: 1127545b-6385-8e8f-e1de-ecb1d7afa240 (v5, owner: esx-vsan2.XXXX, proxy owner: None, policy: hostFailuresToTolerate = 1, CSN = 166, spbmProfileName = vSAN Default Storage Policy, forceProvisioning = 0, proportionalCapacity = 0, cacheReservation = 0, SCSN = 163, spbmProfileGenerationNumber = 2, spbmProfileId = aa6d5a82-1c88-45da-85d3-3d74b91a5bad, stripeWidth = 1, checksumDisabled = 0)

      RAID_1

        Component: 8b7ccc5b-4731-cca3-9b35-ecb1d7afa960 (state: ACTIVE (5), host: esx-vsan4.XXX.com, md: naa.5000c5008f90314b, ssd: naa.500a07511077a844,

                                                         votes: 1, usage: 4.5 GB, proxy component: false)

        Component: 53ced25b-9fb5-6c65-1c2b-ecb1d7afa710 (state: ACTIVE (5), host: esx-vsan2.XXXX.com, md: naa.5000c5008f6f2c03, ssd: naa.5002538c40189cf8,

                                                         votes: 1, usage: 4.5 GB, proxy component: false)

      Witness: a8ced25b-3214-42ae-d8b1-ecb1d7afa710 (state: ACTIVE (5), host: esx-vsan3.XXX.com, md: naa.5000c5008f227e4b, ssd: naa.5002538c40079cc6,

                                                     votes: 1, usage: 0.0 GB, proxy component: false)

      Extended attributes:

        Address space: 273804165120B (255.00 GB)

        Object class: vmnamespace

        Object path: /vmfs/volumes/vsan:5233711eafa2f823-3a7993e596f4f532/

        Object capabilities: NONE



  • 6.  RE: VSAN Performance service issue
    Best Answer

    Broadcom Employee
    Posted Oct 26, 2018 04:49 PM

    We do not see any issue with the stats db object itself. Components appears to be healthy and may just an issue with the service. You could now attempt to delete the stats db as suggested by Bobkin and re-enable the performance service

    1. Command to check and confirm if the referenced object UUID is indeed related to vSAN STATS db under "path" or "user friendly name".

    /usr/lib/vmware/osfs/bin/objtool getAttr -u 1127545b-6385-8e8f-e1de-ecb1d7afa240

    2. Once you confirmed the object is indeed the stats db, please run the below command to delete the object. Please make sure the UUID is correct, and is the object id for stats db, once deleted obj cant be recovered.

    /usr/lib/vmware/osfs/bin/objtool delete -u 1127545b-6385-8e8f-e1de-ecb1d7afa240 -f


    If it fails, try on the owner host vsan2.

    3. After the deletion, re-enable the vSAN performance service.

    Thanks



  • 7.  RE: VSAN Performance service issue

    Posted Oct 26, 2018 05:04 PM

    Thanks, VPradeep  ,Its  Solved  with your Advise :smileyhappy:

    Thanks Bob  , You also help me a lot to reach to the end   :smileyhappy:



  • 8.  RE: VSAN Performance service issue

    Broadcom Employee
    Posted Sep 30, 2022 02:01 PM

    Hi

     

    I'm having a very similar error with my home lab vSAN setup.  I have a constant "Stats primary election" error in Skyline Health/Performance service.  Tried disabling/enabling Performance service, tried removing all hosts from cluster, deleting/creating cluster and starting fresh, always get "Stats primary election error".  Everything else appears healthy.  Feels like old objects are still on the vSAN datastore (.vsan.stats, and two older vCLS perf VMs).  Even with vSAN Performance Service disabled, I still see this:

    [root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] ls -al
    ls: ./vCLS-73947cf9-2697-4f64-992a-f5fa0c366729: Input/output error
    ls: ./78d23563-5d54-b305-bb98-0050569a1de6: Input/output error
    ls: ./.vsan.stats: Input/output error
    ls: ./5ed23563-55ac-861c-e0db-0050569a20c1: Input/output error
    ls: ./vCLS-27868099-a321-4ebc-81ce-0463c1e8ec10: Input/output error
    ls: ./b9d23563-f802-e072-e9ee-0050569a20c1: Input/output error
    total 1024
    drwxr-xr-x 1 root root 512 Sep 30 13:55 .
    drwxr-xr-x 1 root root 512 Sep 30 13:55 ..
    drwxr-xr-t 1 root root 77824 Sep 29 17:16 9bd23563-34fd-a4a8-014d-0050569a12ed
    lrwxr-xr-x 1 root root 36 Sep 30 13:55 vCLS-0d4f9299-9bf1-4aa5-a7ba-6f577f32850b -> 9bd23563-34fd-a4a8-014d-0050569a12ed
    [root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180]

     

    If I try to remove the .vsan.stats object (which I'm not sure is really there), I get:

     

    [root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] /usr/lib/vmware/osfs/bin/objtool getAttr -u 5ed23563-55ac-861c-e0db-0050569a20c
    1
    Failed to get object attributes : Input/output error 327684.
    object getAttr error: Failure
    [root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] /usr/lib/vmware/osfs/bin/objtool delete -u 5ed23563-55ac-861c-e0db-0050569a20c1
    object deletion ioctl failed: Input/output error
    object delete error: Failure
    [root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180]

     

    Help!



  • 9.  RE: VSAN Performance service issue

    Posted Sep 30, 2022 02:41 PM

     , try deleting it with -f (force) flag and maybe from the node that is DOM owner of it currently.

     

    If new stats object is then getting created successfully but still no master getting elected or stats being contributed then validate TCP port 80 is open between the nodes vSAN IPs.



  • 10.  RE: VSAN Performance service issue

    Broadcom Employee
    Posted Sep 30, 2022 10:00 PM

    Here's what I get trying to force delete the object:

    [root@vesxi-02:~] /usr/lib/vmware/osfs/bin/objtool delete -u 30393763-dca4-0ab0-7099-0050569a12ed -f -v 10
    object deletion ioctl failed: Input/output error
    object delete error: Failure
    [root@vesxi-02:~]

    Maybe unrelated, but it seems like every time I run this, I get a new additional Inaccessible Object showing in the Cluster / Monitor / vSAN / Virtual Objects.  Trying the same command above with any of them results in the same failure.

    As for the Stats error, a new stats object is indeed getting created successfuly but no master getting elected still, and I tried this:

    [root@vesxi-01:~] nc -z 192.168.3.102 80
    Connection to 192.168.3.102 80 port [tcp/http] succeeded!
    [root@vesxi-01:~] nc -z 192.168.3.103 80
    Connection to 192.168.3.103 80 port [tcp/http] succeeded!
    [root@vesxi-01:~] nc -z 192.168.3.104 80
    Connection to 192.168.3.104 80 port [tcp/http] succeeded!
    [root@vesxi-01:~]

    Any other thoughts??



  • 11.  RE: VSAN Performance service issue

    Posted Oct 26, 2018 04:24 PM

    when i go to vsan.stats  directory , there is no file

    :

    ls: ./stats.db.lck: No such file or directory

    ls: ./stats.db.tpl: No such file or directory

    Actually, i removed all file before manually to see if this problem solved, but not  fix the issue