Hello VSAN community
I have strange issue on one my customer production environment, since last week, i git Red error on VSAN health Performance services" State Master election Failed"
when I try to turn off the Vsan performance services also it shows in task name "the object or item referred to could not be found"
i update all ESX servers from 6.5 U1 to 6.5 U2 with the latest patch and also Vcenter is 6.5 U2 , but still same error
I used the below commands for fix the issue but not working
On each host in the cluster from a ssh session run:
/etc/init.d/vsanmgmtd restart
/etc/init.d/vsanvpd restart
ON vCenter through an ssh session restart VSAN Health:
service-control --stop vmware-vsan-health
service-control --start vmware-vsan-health
service-control --stop vmware-sps
service-control --start vmware-sps
I really don't know what to do, is any recommendation or help from you guys?
Thanks
Alex
We do not see any issue with the stats db object itself. Components appears to be healthy and may just an issue with the service. You could now attempt to delete the stats db as suggested by Bobkin and re-enable the performance service
1. Command to check and confirm if the referenced object UUID is indeed related to vSAN STATS db under "path" or "user friendly name".
/usr/lib/vmware/osfs/bin/objtool getAttr -u 1127545b-6385-8e8f-e1de-ecb1d7afa240
2. Once you confirmed the object is indeed the stats db, please run the below command to delete the object. Please make sure the UUID is correct, and is the object id for stats db, once deleted obj cant be recovered.
/usr/lib/vmware/osfs/bin/objtool delete -u 1127545b-6385-8e8f-e1de-ecb1d7afa240 -f
If it fails, try on the owner host vsan2.
3. After the deletion, re-enable the vSAN performance service.
Thanks
Hello Alex,
What I find strange there is the seemingly conflicting information - the 'Stats DB Object' check is green (which includes the Object health check) but the Object is not found when you try to disable the service. Can you check that the Proactive VM creation test passes just in case you are having a VASA/SPBM issue here?:
Cluster > Monitor > vSAN > Proactive Tests > VM Creation Test
If this is okay, I would advise starting with manually checking the state, component locations and Owner of this Object via RVC:
> vsan.perf.stats_object_info <PathToCluster>
The Object that the Performance service writes to (Stats DB Object) can also be removed and recreated via RVC (you can see all the options by typing vsan.perf.stats_object_ TAB TAB).
Bob
Hello Bob
Thanks for your reply , actually VM creation working fine and based on you advise i login for RVC,
checking performance state not showing me the error, but when i try to delete it with vsan.perf.stats_object_delete it show me error :
/computers> vsan.perf.stats_object_delete 1
Deleting vSAN Stats DB object, which will stop vSAN Performance Service ...
Task: Disable vSAN performance service
New progress: 1%
Task result: error
Hello Alex,
And what output do you see when you run vsan.perf.stats_object_info?
If it still exists and is unhealthy it may be possible to manually delete the Stats DB Object using Objtool (obviously make sure you are deleting the correct Object) or remediate it otherwise via abdication.
If you SSH to one of the hosts and go down to the vsanDatastore (/vmfs/volumes/vsanDatastore with default name) and use ls -lah, do you see a .vsan.stats symlink? And if so, can you cd into the namespace it points to?
What does the state of the components look like when you query the UUID of the namespace with # cmmds-tool find -t DOM_OBJECT -f json -u <UUID> (and/or using vsan.object_info <PathToCluster> <UUID> )?
Bob
Hello Bob
Here is the vsan.perf.stats_object_info
> vsan.perf.stats_object_info 1
Directory Name: .vsan.stats
vSAN Object UUID: 1127545b-6385-8e8f-e1de-ecb1d7afa240
SPBM Profile: vSAN Default Storage Policy
vSAN Policy: cacheReservation: 0, checksumDisabled: 0, spbmProfileName: vSAN Default Storage Policy, spbmProfileId: aa6d5a82-1c88-45da-85d3-3d74b91a5bad, CSN: 166, proportionalCapacity: 0, spbmProfileGenerationNumber: 2, stripeWidth: 1, forceProvisioning: 0, SCSN: 163, hostFailuresToTolerate: 1
vSAN Object Health: healthy
DOM Object: 1127545b-6385-8e8f-e1de-ecb1d7afa240 (v5, owner: esx-vsan2.XXXX, proxy owner: None, policy: hostFailuresToTolerate = 1, CSN = 166, spbmProfileName = vSAN Default Storage Policy, forceProvisioning = 0, proportionalCapacity = 0, cacheReservation = 0, SCSN = 163, spbmProfileGenerationNumber = 2, spbmProfileId = aa6d5a82-1c88-45da-85d3-3d74b91a5bad, stripeWidth = 1, checksumDisabled = 0)
RAID_1
Component: 8b7ccc5b-4731-cca3-9b35-ecb1d7afa960 (state: ACTIVE (5), host: esx-vsan4.XXX.com, md: naa.5000c5008f90314b, ssd: naa.500a07511077a844,
votes: 1, usage: 4.5 GB, proxy component: false)
Component: 53ced25b-9fb5-6c65-1c2b-ecb1d7afa710 (state: ACTIVE (5), host: esx-vsan2.XXXX.com, md: naa.5000c5008f6f2c03, ssd: naa.5002538c40189cf8,
votes: 1, usage: 4.5 GB, proxy component: false)
Witness: a8ced25b-3214-42ae-d8b1-ecb1d7afa710 (state: ACTIVE (5), host: esx-vsan3.XXX.com, md: naa.5000c5008f227e4b, ssd: naa.5002538c40079cc6,
votes: 1, usage: 0.0 GB, proxy component: false)
Extended attributes:
Address space: 273804165120B (255.00 GB)
Object class: vmnamespace
Object path: /vmfs/volumes/vsan:5233711eafa2f823-3a7993e596f4f532/
Object capabilities: NONE
when i go to vsan.stats directory , there is no file
:
ls: ./stats.db.lck: No such file or directory
ls: ./stats.db.tpl: No such file or directory
Actually, i removed all file before manually to see if this problem solved, but not fix the issue
We do not see any issue with the stats db object itself. Components appears to be healthy and may just an issue with the service. You could now attempt to delete the stats db as suggested by Bobkin and re-enable the performance service
1. Command to check and confirm if the referenced object UUID is indeed related to vSAN STATS db under "path" or "user friendly name".
/usr/lib/vmware/osfs/bin/objtool getAttr -u 1127545b-6385-8e8f-e1de-ecb1d7afa240
2. Once you confirmed the object is indeed the stats db, please run the below command to delete the object. Please make sure the UUID is correct, and is the object id for stats db, once deleted obj cant be recovered.
/usr/lib/vmware/osfs/bin/objtool delete -u 1127545b-6385-8e8f-e1de-ecb1d7afa240 -f
If it fails, try on the owner host vsan2.
3. After the deletion, re-enable the vSAN performance service.
Thanks
Thanks, VPradeep ,Its Solved with your Advise
Thanks Bob , You also help me a lot to reach to the end
Hi
I'm having a very similar error with my home lab vSAN setup. I have a constant "Stats primary election" error in Skyline Health/Performance service. Tried disabling/enabling Performance service, tried removing all hosts from cluster, deleting/creating cluster and starting fresh, always get "Stats primary election error". Everything else appears healthy. Feels like old objects are still on the vSAN datastore (.vsan.stats, and two older vCLS perf VMs). Even with vSAN Performance Service disabled, I still see this:
[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] ls -al
ls: ./vCLS-73947cf9-2697-4f64-992a-f5fa0c366729: Input/output error
ls: ./78d23563-5d54-b305-bb98-0050569a1de6: Input/output error
ls: ./.vsan.stats: Input/output error
ls: ./5ed23563-55ac-861c-e0db-0050569a20c1: Input/output error
ls: ./vCLS-27868099-a321-4ebc-81ce-0463c1e8ec10: Input/output error
ls: ./b9d23563-f802-e072-e9ee-0050569a20c1: Input/output error
total 1024
drwxr-xr-x 1 root root 512 Sep 30 13:55 .
drwxr-xr-x 1 root root 512 Sep 30 13:55 ..
drwxr-xr-t 1 root root 77824 Sep 29 17:16 9bd23563-34fd-a4a8-014d-0050569a12ed
lrwxr-xr-x 1 root root 36 Sep 30 13:55 vCLS-0d4f9299-9bf1-4aa5-a7ba-6f577f32850b -> 9bd23563-34fd-a4a8-014d-0050569a12ed
[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180]
If I try to remove the .vsan.stats object (which I'm not sure is really there), I get:
[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] /usr/lib/vmware/osfs/bin/objtool getAttr -u 5ed23563-55ac-861c-e0db-0050569a20c
1
Failed to get object attributes : Input/output error 327684.
object getAttr error: Failure
[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180] /usr/lib/vmware/osfs/bin/objtool delete -u 5ed23563-55ac-861c-e0db-0050569a20c1
object deletion ioctl failed: Input/output error
object delete error: Failure
[root@vesxi-01:/vmfs/volumes/vsan:52f295bfcfc07e3e-57fc0517ff81c180]
Help!
@StephenS , try deleting it with -f (force) flag and maybe from the node that is DOM owner of it currently.
If new stats object is then getting created successfully but still no master getting elected or stats being contributed then validate TCP port 80 is open between the nodes vSAN IPs.
Here's what I get trying to force delete the object:
[root@vesxi-02:~] /usr/lib/vmware/osfs/bin/objtool delete -u 30393763-dca4-0ab0-7099-0050569a12ed -f -v 10
object deletion ioctl failed: Input/output error
object delete error: Failure
[root@vesxi-02:~]
Maybe unrelated, but it seems like every time I run this, I get a new additional Inaccessible Object showing in the Cluster / Monitor / vSAN / Virtual Objects. Trying the same command above with any of them results in the same failure.
As for the Stats error, a new stats object is indeed getting created successfuly but no master getting elected still, and I tried this:
[root@vesxi-01:~] nc -z 192.168.3.102 80
Connection to 192.168.3.102 80 port [tcp/http] succeeded!
[root@vesxi-01:~] nc -z 192.168.3.103 80
Connection to 192.168.3.103 80 port [tcp/http] succeeded!
[root@vesxi-01:~] nc -z 192.168.3.104 80
Connection to 192.168.3.104 80 port [tcp/http] succeeded!
[root@vesxi-01:~]
Any other thoughts??