Re: Strange vSAN behavior? Should I worry about?"

wgreb · ‎01-16-2015

Hello,

Recently, I am facing strange problems with vSAN.

There are many strange lines in clomd.log file.

2015-01-16T20:30:38Z clomd[17349684]: CLOMReplaceUnhealthyComponent:Fixing component 62d03c54-1097-3561-a6fe-0cc47a120cea state 9

2015-01-16T20:30:38Z clomd[17349684]: I120: CLOMAddWitness:Couldn't find disk for the witness: Not found

2015-01-16T20:30:38Z clomd[17349684]: I120: CLOMSetQuorumVotes:Can't allocate witnesses: Not found

2015-01-16T20:30:38Z clomd[17349684]: I120: CLOMReconfigure:exit: obj bf013554-a483-c6f2-5875-0cc47a120cea configDelay 0 newConfigGenerated 0 status Not found

Additionally a few VMs have VM storage policy with "Not Compliant". If I try to force action - "Check storage policy compliance" - I got an error:

/vmfs/volumes/vsan:5262b32c49c45e2b-93c33f2f9b6f13f7/5b204854-bada-d675-b055-0cc47a1210ca/DB_test.vmdk - was not found.

I want to ask you guys whether you have faced that problems, if yes, do you know what cause it and what sort of solution or workaround do you recommend?

I appreciate any help.

Best regards,

wgreb · ‎01-16-2015

Hello again,

I found that one of the SSD disk is in degraded mode and one of the ESXi host has many lines in vmkernel.log as below:

2015-01-16T21:53:58.219Z cpu12:33098)WARNING: LSOMCommon: SSDLOG_WriteLogEntry:599: Throttled: Log has encountered (Maximum kernel-level retries exceeded) error device: naa.50025388700159bd:2

2015-01-16T21:54:00.007Z cpu8:1532934)World: 14299: VC opID hostd-60e7 maps to vmkernel opID 85c66937

2015-01-16T21:54:20.007Z cpu20:1992259)World: 14299: VC opID hostd-fe2a maps to vmkernel opID f1cb6434

In vSAN disk group the health status shows "Permanent disk failure".

the command line: esxcli storage core device smart get -d naa.50025388700159bd shows that health status is okay and disk works properly.

What can be done to solve this issue? Because now I am confused whether I should go to the shop and buy the new SSD hard drive or this is the vSAN issue?

Best regards,

WG

wgreb · ‎02-05-2015

Guys,

The problem has reoccured and encountered at different host, the same output line in logs files:

2015-02-05T07:56:13.845Z cpu18:34295)WARNING: LSOMCommon: SSDLOG_WriteLogEntry:599: Throttled: Log has encountered (Maximum kernel-level retries exceeded) error device: naa.50025388700159c0:2

2015-02-05T07:56:20.008Z cpu6:26931657)World: 14299: VC opID hostd-6e1b maps to vmkernel opID 6d621b94

2015-02-05T07:56:40.007Z cpu22:26997194)World: 14299: VC opID hostd-63bc maps to vmkernel opID d250dbcc

2015-02-05T07:56:43.848Z cpu14:34617)WARNING: LSOMCommon: SSDLOG_WriteLogEntry:599: Throttled: Log has encountered (Maximum kernel-level retries exceeded) error device: naa.50025388700159c0:2

2015-02-05T07:57:00.010Z cpu2:34295)World: 14299: VC opID hostd-c62f maps to vmkernel opID e1d3f9f

2015-02-05T07:57:13.852Z cpu3:32771)WARNING: LSOMCommon: SSDLOG_WriteLogEntry:599: Throttled: Log has encountered (Maximum kernel-level retries exceeded) error device: naa.50025388700159c0:2

2015-02-05T07:57:16.567Z cpu36:34246)World: 14299: VC opID hostd-890c maps to vmkernel opID f9eca663

2015-02-05T07:57:20.009Z cpu4:34249)World: 14299: VC opID hostd-0348 maps to vmkernel opID 43fe0e1d

2015-02-05T07:57:26.427Z cpu36:34246)World: 14299: VC opID hostd-26ef maps to vmkernel opID 2790db00

2015-02-05T07:57:36.491Z cpu16:685124)World: 14299: VC opID hostd-ba8d maps to vmkernel opID a0e9e15d

2015-02-05T07:57:40.006Z cpu0:34249)World: 14299: VC opID hostd-f8ca maps to vmkernel opID 6eca1e78

2015-02-05T07:57:43.855Z cpu10:26889027)WARNING: LSOMCommon: SSDLOG_WriteLogEntry:599: Throttled: Log has encountered (Maximum kernel-level retries exceeded) error device: naa.50025388700159c0:2

2015-02-05T07:58:00.009Z cpu27:34290)World: 14299: VC opID hostd-f34a maps to vmkernel opID d551dd22

Do you know what is wrong? I got nothing at vCenter alarms and vSan cluster in vCenter shows that all disks are fine. Only view with disk group shows that the problem is with one of SSD disk, but reboot solved the problem.

I will much appreciate for any advice or help.

Best regards,

wgreb

depping · ‎02-11-2015

this definitely seems to indicate that there is something wrong with that device. I would recommend calling VMware Support to let them figure out what exactly is wrong.

Also, are you running certified: disks / ssd / disk controller? Did you check if you are running the right versions of the firmware / drivers?

All

Strange vSAN behavior? Should I worry about?"