VMware Cloud Community
wmarusiak
Enthusiast
Enthusiast

Single node vSAN cluster boot error after upgrade from 6.5 to 6.7

Hi,

I have single node vSAN cluster used as a homelab for many nested environments. I upgraded ESXi 6.5 to 6.7 without any issues. I noticed that during boot there is error recovery progress - see attached photo.

2018-09-04 22_07_19-Resolution_1024x768 FPS _10.png

I found as well something related in logs

2018-09-04T20:07:53.291Z cpu0:2097506)WARNING: Local node faultDomain ID is changed from 00000000-0000-0000-0000-000000000000 to 59b83712-415a-dc44-993d-0025905e041e

2018-09-04T20:07:54.305Z cpu5:2097506)WARNING: LSOMCommon: LSOM_DiskGroupCreate:1481: Disk group already created uuid: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

2018-09-04T20:07:54.313Z cpu5:2097316)WARNING: NFS: 1227: Invalid volume UUID t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2

2018-09-04T20:07:54.396Z cpu0:2097506)WARNING: NFS: 1227: Invalid volume UUID 59b837bc-f856ecbc-0c43-0025905e041e

2018-09-04T20:08:07.381Z cpu0:2099225)WARNING: NTPClock: 1561: system clock synchronized to upstream time servers

2018-09-04T20:08:08.281Z cpu1:2099253)WARNING: LSOMCommon: LSOM_DiskGroupCreate:1481: Disk group already created uuid: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

2018-09-04T20:08:08.289Z cpu4:2097312)WARNING: NFS: 1227: Invalid volume UUID t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2

2018-09-04T20:08:08.510Z cpu4:2099257)WARNING: LSOMCommon: LSOM_DiskGroupCreate:1481: Disk group already created uuid: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

2018-09-04T20:08:08.519Z cpu1:2097312)WARNING: NFS: 1227: Invalid volume UUID t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2

2018-09-04T20:08:09.711Z cpu2:2099345)WARNING: LSOMCommon: LSOM_DiskGroupCreate:1481: Disk group already created uuid: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

2018-09-04T20:08:09.719Z cpu0:2097310)WARNING: NFS: 1227: Invalid volume UUID t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2

After boot ESXi is ok, vSAN status looks good.

[root@ESXi:/vsantraces] esxcli vsan storage list

t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00

   Device: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00

   Display Name: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00

   Is SSD: true

   VSAN UUID: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

   VSAN Disk Group UUID: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

   VSAN Disk Group Name: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00

   Used by this host: true

   In CMMDS: true

   On-disk format version: 5

   Deduplication: false

   Compression: false

   Checksum: 3722446061210474928

   Checksum OK: true

   Is Capacity Tier: false

   Encryption: false

   DiskKeyLoaded: false

   Is Mounted: true

t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144

   Device: t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144

   Display Name: t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144

   Is SSD: true

   VSAN UUID: 52c93db2-c879-ce09-ea2a-664a0ae10485

   VSAN Disk Group UUID: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

   VSAN Disk Group Name: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00

   Used by this host: true

   In CMMDS: true

   On-disk format version: 5

   Deduplication: false

   Compression: false

   Checksum: 12121898985191738183

   Checksum OK: true

   Is Capacity Tier: true

   Encryption: false

   DiskKeyLoaded: false

   Is Mounted: true

[root@ESXi:~] esxcli vsan cluster get

Cluster Information

   Enabled: true

   Current Local Time: 2018-09-04T20:11:10Z

   Local Node UUID: 59b83712-415a-dc44-993d-0025905e041e

   Local Node Type: NORMAL

   Local Node State: MASTER

   Local Node Health State: HEALTHY

   Sub-Cluster Master UUID: 59b83712-415a-dc44-993d-0025905e041e

   Sub-Cluster Backup UUID:

   Sub-Cluster UUID: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

   Sub-Cluster Membership Entry Revision: 0

   Sub-Cluster Member Count: 1

   Sub-Cluster Member UUIDs: 59b83712-415a-dc44-993d-0025905e041e

   Sub-Cluster Membership UUID: 22e58e5b-16de-fb32-0412-0025905e04e4

   Unicast Mode Enabled: true

   Maintenance Mode State: OFF

   Config Generation: None 0 0.0

Any idea how to troubleshoot?

Cheers
Wojciech

Best Regards, Wojciech https://wojcieh.net
Reply
0 Kudos
3 Replies
GreatWhiteTec
VMware Employee
VMware Employee

Any resyncing components? My guess is that there may be a rebalance operation going on.

Reply
0 Kudos
TheBobkin
Champion
Champion

Hello Wojciech,

Display messages for PLOG recovery during boot (e.g. initializing the Disk-Groups) has a % number added now, this doesn't indicate an issue.

You can verify that it completed this process by looking in the boot.gz or vmkernel.log for PLOG recovery being successful.

GreatWhiteTec  - how much data do you think could be out of sync and balance in a 1 capacity-tier device 1-node cluster?  Smiley Wink

Bob

Reply
0 Kudos
wmarusiak
Enthusiast
Enthusiast

Hi,

I checked vmkernel.log during boot and it doesn't look good.

2018-09-16 21_55_43-Resolution_1024x768 FPS _29.png

Those are the last entries I see in vmkernel.log

2018-09-16T18:32:18.733Z cpu4:2099026)PLOG: PLOGMapMetadataPartition:2607: SSD acks (2) 1 healthy MDs

2018-09-16T18:32:18.735Z cpu4:2099026)PLOG: PLOGProbeDevice:5728: Probed plog device <t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:1> 0x430ab0920d40 exists.. continue with old entry

2018-09-16T18:32:18.735Z cpu4:2099026)WARNING: LSOMCommon: LSOM_DiskGroupCreate:1481: Disk group already created uuid: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

2018-09-16T18:32:18.735Z cpu4:2099026)LSOMCommon: SSDLOG_AddDisk:877: Existing ssd found t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2

2018-09-16T18:32:18.735Z cpu4:2099026)PLOG: PLOGAnnounceSSD:7268: Successfully added VSAN SSD (t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2) with UUID 522288b0-d99b-a8f8-6dd5-44b7ef7354e4. kt 1, en 0, enC 0.

2018-09-16T18:32:18.735Z cpu4:2099026)VSAN: Initializing SSD: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4 Please wait...

2018-09-16T18:32:18.736Z cpu5:2098374)PLOG: PLOGNotifyDisks:4546: MD 0 with UUID 52c93db2-c879-ce09-ea2a-664a0ae10485 with state 0 formatVersion 5 backing SSD 522288b0-d99b-a8f8-6dd5-44b7ef7354e4 notified

2018-09-16T18:32:18.736Z cpu5:2098374)PLOG: PLOG_Recover:884: !!!! SSD 522288b0-d99b-a8f8-6dd5-44b7ef7354e4 already recovered

2018-09-16T18:32:18.736Z cpu4:2099026)VSAN: Successfully Initialized: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

2018-09-16T18:32:18.736Z cpu4:2099026)PLOG: PLOGOpenDevice:4238: Disk handle open failure for device t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2, status:Busy

2018-09-16T18:32:18.737Z cpu4:2099026)PLOG: PLOGInitAndAnnounceMD:7737: Successfully announced VSAN MD (t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2) with UUID: 52c93db2-c879-ce09-ea2a-664a0ae10485. kt 1, en 0, enC 0.

2018-09-16T18:32:18.821Z cpu2:2097312)PLOG: PLOGOpenDevice:4238: Disk handle open failure for device t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2, status:Busy

2018-09-16T18:32:18.821Z cpu2:2097312)PLOG: PLOGOpenDevice:4238: Disk handle open failure for device t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2, status:Busy

2018-09-16T18:32:18.821Z cpu2:2097312)Vol3: 2674: Could not open device 't10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2' for probing: Busy

2018-09-16T18:32:18.821Z cpu2:2097312)WARNING: NFS: 1227: Invalid volume UUID t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2

2018-09-16T18:32:18.821Z cpu2:2097312)PLOG: PLOGOpenDevice:4238: Disk handle open failure for device t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2, status:Busy

2018-09-16T18:32:18.822Z cpu2:2097312)PLOG: PLOGOpenDevice:4238: Disk handle open failure for device t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2, status:Busy

2018-09-16T18:32:18.822Z cpu2:2097312)Vol3: 1201: Could not open device 't10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2' for volume open: Busy

2018-09-16T18:32:18.822Z cpu2:2097312)PLOG: PLOGOpenDevice:4238: Disk handle open failure for device t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2, status:Busy

2018-09-16T18:32:18.822Z cpu2:2097312)Vol3: 1201: Could not open device 't10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2' for volume open: Busy

2018-09-16T18:32:18.823Z cpu2:2097312)PLOG: PLOGOpenDevice:4238: Disk handle open failure for device t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2, status:Busy

2018-09-16T18:32:18.823Z cpu2:2097312)Vol3: 1201: Could not open device 't10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2' for volume open: Busy

2018-09-16T18:32:18.823Z cpu2:2097312)PLOG: PLOGOpenDevice:4238: Disk handle open failure for device t10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2, status:Busy

2018-09-16T18:32:18.823Z cpu2:2097312)Vol3: 1201: Could not open device 't10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2' for volume open: Busy

2018-09-16T18:32:18.823Z cpu2:2097312)FSS: 6092: No FS driver claimed device 't10.ATA_____Crucial_CT2050MX300SSD1_________________________1651150F2144:2': No filesystem on the device

2018-09-16T18:32:19.077Z cpu3:2099026)VC: 4616: Device rescan time 102 msec (total number of devices 6)

2018-09-16T18:32:19.077Z cpu3:2099026)VC: 4619: Filesystem probe time 253 msec (devices probed 4 of 6)

2018-09-16T18:32:19.077Z cpu3:2099026)VC: 4621: Refresh open volume time 3 msec

2018-09-16T18:32:19.203Z cpu5:2099027)vmw_ahci[0000001f]: scsiDiscover:channel=0, target=2, lun=0, action=0

2018-09-16T18:32:19.203Z cpu5:2099027)vmw_ahci[0000001f]: scsiDiscover:No media

2018-09-16T18:32:19.203Z cpu5:2099027)vmw_ahci[0000001f]: scsiDiscover:channel=0, target=3, lun=0, action=0

2018-09-16T18:32:19.203Z cpu5:2099027)vmw_ahci[0000001f]: scsiDiscover:No media

2018-09-16T18:32:19.203Z cpu5:2099027)vmw_ahci[0000001f]: scsiDiscover:channel=0, target=4, lun=0, action=0

2018-09-16T18:32:19.203Z cpu5:2099027)vmw_ahci[0000001f]: scsiDiscover:No media

2018-09-16T18:32:19.203Z cpu5:2099027)vmw_ahci[0000001f]: scsiDiscover:channel=0, target=5, lun=0, action=0

2018-09-16T18:32:19.203Z cpu5:2099027)vmw_ahci[0000001f]: scsiDiscover:No media

And some here

2018-09-16T18:27:34.266Z cpu2:2097506)LSOMCommon: SSDLOGInitDescForIO:803: Recovery complete with Success... device: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2

2018-09-16T18:27:34.266Z cpu1:2097600)ScsiDeviceIO: 3015: Cmd(0x459a40fae580) 0x85, CmdSN 0x0 from world 2097505 to dev "t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2018-09-16T18:27:34.266Z cpu2:2097506)LSOMCommon: LSOM_RegisterDiskAttrHandle:124: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2 is a non-SATA disk

2018-09-16T18:27:34.266Z cpu2:2097506)LSOMCommon: LSOM_RegisterDiskAttrHandle:131: DiskAttrHandle:0x4313fcf6c7a8 is added to disk:t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2 by module:lsomcommon

2018-09-16T18:27:34.266Z cpu2:2097506)PLOG: PLOG_DeviceCreateSSDLogHandle:1096: Registered APD callback for 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

2018-09-16T18:27:34.266Z cpu2:2097506)LSOMCommon: IORETRY_Create:2585: An IORETRY queue for diskUUID 522288b0-d99b-a8f8-6dd5-44b7ef7354e4 (0x4313fcf7e850) is NOT encrypted

2018-09-16T18:27:34.266Z cpu2:2097506)LSOMCommon: IORETRY_Create:2624: Queue Depth for device 522288b0-d99b-a8f8-6dd5-44b7ef7354e4 set to 920

2018-09-16T18:27:34.266Z cpu2:2097506)ScsiEvents: 300: EventSubsystem: Device Events, Event Mask: 20, Parameter: 0x4313fcf7e850, Registered!

2018-09-16T18:27:34.267Z cpu2:2097506)Created VSAN Slab PLOGIORetry_slab_0000000000 (objSize=272 align=64 minObj=2500 maxObj=25000 overheadObj=0 minMemUsage=836k maxMemUsage=8336k)

2018-09-16T18:27:34.267Z cpu2:2097506)Created VSAN Slab PLOGIORetry_slab_0000000001 (objSize=272 align=64 minObj=2500 maxObj=25000 overheadObj=0 minMemUsage=836k maxMemUsage=8336k)

2018-09-16T18:27:35.267Z cpu2:2097506)PLOG: PLOGAnnounceSSD:7253: Trace task started for device 522288b0-d99b-a8f8-6dd5-44b7ef7354e4

2018-09-16T18:27:35.267Z cpu2:2097506)PLOG: PLOGAnnounceSSD:7268: Successfully added VSAN SSD (t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2) with UUID 522288b0-d99b-a8f8-6dd5-44b7ef7354e4. kt 1, en 0, enC 0.

2018-09-16T18:27:35.267Z cpu2:2097506)VSAN: Initializing SSD: 522288b0-d99b-a8f8-6dd5-44b7ef7354e4 Please wait...

2018-09-16T18:27:35.268Z cpu4:2098374)PLOG: PLOGNotifyDisks:4546: MD 0 with UUID 52c93db2-c879-ce09-ea2a-664a0ae10485 with state 0 formatVersion 5 backing SSD 522288b0-d99b-a8f8-6dd5-44b7ef7354e4 notified

2018-09-16T18:27:35.268Z cpu4:2098374)VSANServer: VSANServer_InstantiateServer:3380: Instantiated VSANServer 0x430ab0922e18

2018-09-16T18:27:35.269Z cpu5:2098134)Created VSAN Slab RcSsdParentsSlab_0x4313fcf83200 (objSize=208 align=64 minObj=2500 maxObj=25000 overheadObj=0 minMemUsage=668k maxMemUsage=6668k)

2018-09-16T18:27:35.270Z cpu5:2098134)Created VSAN Slab RcSsdIoSlab_0x4313fcf83200 (objSize=65536 align=64 minObj=64 maxObj=25000 overheadObj=0 minMemUsage=4352k maxMemUsage=1700000k)

2018-09-16T18:27:35.270Z cpu5:2098134)Created VSAN Slab RcSsdMdBElemSlab_0x4313fcf83200 (objSize=32 align=64 minObj=4 maxObj=4096 overheadObj=0 minMemUsage=4k maxMemUsage=264k)

2018-09-16T18:27:35.270Z cpu5:2098134)Created VSAN Slab RCInvBmapSlab_0x4313fcf83200 (objSize=56 align=64 minObj=1 maxObj=1 overheadObj=14 minMemUsage=4k maxMemUsage=4k)

2018-09-16T18:27:35.270Z cpu2:2098386)Global: Virsto_CreateInstance:163: INFO: Create new Virsto instance (heapName: virstoInstance_00000001)

2018-09-16T18:27:35.272Z cpu2:2098386)DOM: DOMDisk_GetServer:259: disk-group w/ SSD 522288b0-d99b-a8f8-6dd5-44b7ef7354e4 on dom/comp server 0

2018-09-16T18:27:35.272Z cpu2:2098386)LSOM: LSOMSendDiskStatusEvent:5424: Throttled: Unable to post disk status event for disk 522288b0-d99b-a8f8-6dd5-44b7ef7354e4: Not found

2018-09-16T18:27:35.447Z cpu0:2098099)LSOMCommon: SSDLOGLogEnumProgress:1406: Recovery progress: 1500 of ~502343 (0%) log blocks. 0s so far, ~58s left. device: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2

vSAN CacheDisk t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2: Recovery progress: 1500 of ~502343 (0%)2018-09-16T18:27:35.612Z cpu3:2098099)LSOMCommon: SSDLOGLogEnumProgress:1406: Recovery progress: 3000 of ~502343 (0%) log blocks. 0s so far, ~56s left. device: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2

vSAN CacheDisk t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2: Recovery progress: 3000 of ~502343 (0%)2018-09-16T18:27:35.775Z cpu2:2098099)LSOMCommon: SSDLOGLogEnumProgress:1406: Recovery progress: 4500 of ~502343 (0%) log blocks. 0s so far, ~55s left. device: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2

vSAN CacheDisk t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2: Recovery progress: 4500 of ~502343 (0%)2018-09-16T18:27:35.937Z cpu2:2098099)LSOMCommon: SSDLOGLogEnumProgress:1406: Recovery progress: 6000 of ~502343 (1%) log blocks. 0s so far, ~55s left. device: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2

vSAN CacheDisk t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2: Recovery progress: 6000 of ~502343 (1%)2018-09-16T18:27:36.102Z cpu3:2098099)LSOMCommon: SSDLOGLogEnumProgress:1406: Recovery progress: 7500 of ~502343 (1%) log blocks. 0s so far, ~54s left. device: t10.NVMe____WDC_WDS256G1X0C2D00ENX0__________________B10A46444A441B00:2

Any idea what might be going on?

Best Regards, Wojciech https://wojcieh.net
Reply
0 Kudos