Ryanware
Contributor
Contributor

VSAN Performance service issue

Jump to solution

Hello VSAN community

I have strange issue  on one my customer production environment, since last week, i git Red error on VSAN health Performance services" State Master election Failed"  

when I try to turn off the Vsan performance services also   it shows   in task name  "the object or item referred to could not be found" 

i update all ESX servers from 6.5 U1 to   6.5 U2 with the latest patch and also Vcenter is 6.5 U2  , but still same error

I used the below commands for fix the issue but not working

On each host in the cluster from a ssh session run:   

    /etc/init.d/vsanmgmtd restart   

    /etc/init.d/vsanvpd restart   

       

ON vCenter through an ssh session restart VSAN Health:   

    service-control --stop vmware-vsan-health   

    service-control --start vmware-vsan-health  

    service-control --stop vmware-sps   

    service-control --start vmware-sps    

       

I really don't know what to do, is any recommendation  or help from you guys? 

Thanks

Alex

Tags (1)
0 Kudos
1 Solution

Accepted Solutions
vpradeep01
VMware Employee
VMware Employee

We do not see any issue with the stats db object itself. Components appears to be healthy and may just an issue with the service. You could now attempt to delete the stats db as suggested by Bobkin and re-enable the performance service

1. Command to check and confirm if the referenced object UUID is indeed related to vSAN STATS db under "path" or "user friendly name".

/usr/lib/vmware/osfs/bin/objtool getAttr -u 1127545b-6385-8e8f-e1de-ecb1d7afa240

2. Once you confirmed the object is indeed the stats db, please run the below command to delete the object. Please make sure the UUID is correct, and is the object id for stats db, once deleted obj cant be recovered.

/usr/lib/vmware/osfs/bin/objtool delete -u 1127545b-6385-8e8f-e1de-ecb1d7afa240 -f


If it fails, try on the owner host vsan2.

3. After the deletion, re-enable the vSAN performance service.

Thanks

View solution in original post

10 Replies
TheBobkin
VMware Employee
VMware Employee

Hello Alex,

What I find strange there is the seemingly conflicting information - the 'Stats DB Object' check is green (which includes the Object health check) but the Object is not found when you try to disable the service. Can you check that the Proactive VM creation test passes just in case you are having a VASA/SPBM issue here?:

Cluster > Monitor > vSAN > Proactive Tests > VM Creation Test

If this is okay, I would advise starting with manually checking the state, component locations and Owner of this Object via RVC:

> vsan.perf.stats_object_info  <PathToCluster>

The Object that the Performance service writes to (Stats DB Object) can also be removed and recreated via RVC (you can see all the options by typing vsan.perf.stats_object_ TAB TAB).

Bob

Ryanware
Contributor
Contributor

Hello Bob

Thanks for your reply , actually  VM creation working fine  and based on you advise i login for RVC,

checking performance state not showing me the error, but when i try to delete it with vsan.perf.stats_object_delete it show me error :

/computers> vsan.perf.stats_object_delete 1

Deleting vSAN Stats DB object, which will stop vSAN Performance Service ...

Task: Disable vSAN performance service

New progress: 1%

Task result: error

0 Kudos
TheBobkin
VMware Employee
VMware Employee

Hello Alex,

And what output do you see when you run vsan.perf.stats_object_info?

If it still exists and is unhealthy it may be possible to manually delete the Stats DB Object using Objtool (obviously make sure you are deleting the correct Object) or remediate it otherwise via abdication.

If you SSH to one of the hosts and go down to the vsanDatastore (/vmfs/volumes/vsanDatastore with default name) and use ls -lah, do you see a .vsan.stats symlink? And if so, can you cd into the namespace it points to?

What does the state of the components look like when you query the UUID of the namespace with # cmmds-tool find -t DOM_OBJECT -f json -u <UUID> (and/or using vsan.object_info <PathToCluster> <UUID> )?

Bob

Ryanware
Contributor
Contributor

Hello Bob

Here is the vsan.perf.stats_object_info

> vsan.perf.stats_object_info 1

Directory Name: .vsan.stats

vSAN Object UUID: 1127545b-6385-8e8f-e1de-ecb1d7afa240

SPBM Profile: vSAN Default Storage Policy

vSAN Policy: cacheReservation: 0, checksumDisabled: 0, spbmProfileName: vSAN Default Storage Policy, spbmProfileId: aa6d5a82-1c88-45da-85d3-3d74b91a5bad, CSN: 166, proportionalCapacity: 0, spbmProfileGenerationNumber: 2, stripeWidth: 1, forceProvisioning: 0, SCSN: 163, hostFailuresToTolerate: 1

vSAN Object Health: healthy

DOM Object: 1127545b-6385-8e8f-e1de-ecb1d7afa240 (v5, owner: esx-vsan2.XXXX, proxy owner: None, policy: hostFailuresToTolerate = 1, CSN = 166, spbmProfileName = vSAN Default Storage Policy, forceProvisioning = 0, proportionalCapacity = 0, cacheReservation = 0, SCSN = 163, spbmProfileGenerationNumber = 2, spbmProfileId = aa6d5a82-1c88-45da-85d3-3d74b91a5bad, stripeWidth = 1, checksumDisabled = 0)

  RAID_1

    Component: 8b7ccc5b-4731-cca3-9b35-ecb1d7afa960 (state: ACTIVE (5), host: esx-vsan4.XXX.com, md: naa.5000c5008f90314b, ssd: naa.500a07511077a844,

                                                     votes: 1, usage: 4.5 GB, proxy component: false)

    Component: 53ced25b-9fb5-6c65-1c2b-ecb1d7afa710 (state: ACTIVE (5), host: esx-vsan2.XXXX.com, md: naa.5000c5008f6f2c03, ssd: naa.5002538c40189cf8,

                                                     votes: 1, usage: 4.5 GB, proxy component: false)

  Witness: a8ced25b-3214-42ae-d8b1-ecb1d7afa710 (state: ACTIVE (5), host: esx-vsan3.XXX.com, md: naa.5000c5008f227e4b, ssd: naa.5002538c40079cc6,

                                                 votes: 1, usage: 0.0 GB, proxy component: false)

  Extended attributes:

    Address space: 273804165120B (255.00 GB)

    Object class: vmnamespace

    Object path: /vmfs/volumes/vsan:5233711eafa2f823-3a7993e596f4f532/

    Object capabilities: NONE

0 Kudos
Ryanware
Contributor
Contributor

when i go to vsan.stats  directory , there is no file

:

ls: ./stats.db.lck: No such file or directory

ls: ./stats.db.tpl: No such file or directory

Actually, i removed all file before manually to see if this problem solved, but not  fix the issue

0 Kudos
vpradeep01
VMware Employee
VMware Employee

We do not see any issue with the stats db object itself. Components appears to be healthy and may just an issue with the service. You could now attempt to delete the stats db as suggested by Bobkin and re-enable the performance service

1. Command to check and confirm if the referenced object UUID is indeed related to vSAN STATS db under "path" or "user friendly name".

/usr/lib/vmware/osfs/bin/objtool getAttr -u 1127545b-6385-8e8f-e1de-ecb1d7afa240

2. Once you confirmed the object is indeed the stats db, please run the below command to delete the object. Please make sure the UUID is correct, and is the object id for stats db, once deleted obj cant be recovered.

/usr/lib/vmware/osfs/bin/objtool delete -u 1127545b-6385-8e8f-e1de-ecb1d7afa240 -f


If it fails, try on the owner host vsan2.

3. After the deletion, re-enable the vSAN performance service.

Thanks

Ryanware
Contributor
Contributor

Thanks, VPradeep  ,Its  Solved  with your Advise Smiley Happy

Thanks Bob  , You also help me a lot to reach to the end   Smiley Happy

0 Kudos
StephenS
VMware Employee
VMware Employee

Hi

 

I'm having a very similar error with my home lab vSAN setup.  I have a constant "Stats primary election" error in Skyline Health/Performance service.  Tried disabling/enabling Performance service, tried removing all hosts from cluster, deleting/creating cluster and starting fresh, always get "Stats primary election error".  Everything else appears healthy.  Feels like old objects are still on the vSAN datastore (.vsan.stats, and two older vCLS perf VMs).  Even with vSAN Performance Service disabled, I still see this:

[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] ls -al
ls: ./vCLS-73947cf9-2697-4f64-992a-f5fa0c366729: Input/output error
ls: ./78d23563-5d54-b305-bb98-0050569a1de6: Input/output error
ls: ./.vsan.stats: Input/output error
ls: ./5ed23563-55ac-861c-e0db-0050569a20c1: Input/output error
ls: ./vCLS-27868099-a321-4ebc-81ce-0463c1e8ec10: Input/output error
ls: ./b9d23563-f802-e072-e9ee-0050569a20c1: Input/output error
total 1024
drwxr-xr-x 1 root root 512 Sep 30 13:55 .
drwxr-xr-x 1 root root 512 Sep 30 13:55 ..
drwxr-xr-t 1 root root 77824 Sep 29 17:16 9bd23563-34fd-a4a8-014d-0050569a12ed
lrwxr-xr-x 1 root root 36 Sep 30 13:55 vCLS-0d4f9299-9bf1-4aa5-a7ba-6f577f32850b -> 9bd23563-34fd-a4a8-014d-0050569a12ed
[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180]

 

If I try to remove the .vsan.stats object (which I'm not sure is really there), I get:

 

[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] /usr/lib/vmware/osfs/bin/objtool getAttr -u 5ed23563-55ac-861c-e0db-0050569a20c
1
Failed to get object attributes : Input/output error 327684.
object getAttr error: Failure
[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] /usr/lib/vmware/osfs/bin/objtool delete -u 5ed23563-55ac-861c-e0db-0050569a20c1
object deletion ioctl failed: Input/output error
object delete error: Failure
[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180]

 

Help!

0 Kudos
TheBobkin
VMware Employee
VMware Employee

@StephenS , try deleting it with -f (force) flag and maybe from the node that is DOM owner of it currently.

 

If new stats object is then getting created successfully but still no master getting elected or stats being contributed then validate TCP port 80 is open between the nodes vSAN IPs.

StephenS
VMware Employee
VMware Employee

Here's what I get trying to force delete the object:

[root@vesxi-02:~] /usr/lib/vmware/osfs/bin/objtool delete -u 30393763-dca4-0ab0-7099-0050569a12ed -f -v 10
object deletion ioctl failed: Input/output error
object delete error: Failure
[root@vesxi-02:~]

Maybe unrelated, but it seems like every time I run this, I get a new additional Inaccessible Object showing in the Cluster / Monitor / vSAN / Virtual Objects.  Trying the same command above with any of them results in the same failure.

As for the Stats error, a new stats object is indeed getting created successfuly but no master getting elected still, and I tried this:

[root@vesxi-01:~] nc -z 192.168.3.102 80
Connection to 192.168.3.102 80 port [tcp/http] succeeded!
[root@vesxi-01:~] nc -z 192.168.3.103 80
Connection to 192.168.3.103 80 port [tcp/http] succeeded!
[root@vesxi-01:~] nc -z 192.168.3.104 80
Connection to 192.168.3.104 80 port [tcp/http] succeeded!
[root@vesxi-01:~]

Any other thoughts??

0 Kudos