Solved: Re: vSAN datastore not browsable after uploading I...

ahmad090 · ‎04-24-2018

hi,

i create a stretched vSAN 6.6 (2+2+1) ,

i used HPE latest 6.5 U1 ESXi Image, with VCSA

the health check was ok and all is setup correctly.

once i uploaded a windows server 2012 r2 ISO file to vsan datastore, the datastore acted weirdly , i couldn't browse it any more. and if i created an empty VM i cant delete it with timeout error.

is there any issue/bug with uploading iso image to vsan datastore ??

please help.

ahmad090 · ‎05-07-2018

hi,

the problem was solved.

when i changed the vDS and vmkernels to MTU=1500 , vSAN worked successfully.

then we noticed that on the physical switches the jumbo frame was 9000 which should be 9216.

when we changed the MTU on physical switches to 9216 the i reverted the vDS and VMkernels to 9000, vSAN worked properly.

thanks,

View solution in original post

VI__ESX_3_5 · ‎04-24-2018

My suggestion will be to create a folder say ISO and then you can upload the image/iso to that folder.

and if it is already uploaded then you can move to that ISO folder.

ahmad090 · ‎04-24-2018

hi

already tired that, same result. vsan datastore not browsable after uploading iso to folder or any object folder

TheBobkin · ‎04-25-2018

Hello ahmad090,

Where are you seeing this as non-browsable?:

- HTML5 Client?

- Web Client?

- SSH directly to host?

Is the host properly clustered? (If it is partitioned then the vsandatastore contents would not be accessible)

# localcli vsan cluster get

Web Client: Cluster > Monitor > Health > Cluster/Network

Bob

ahmad090 · ‎04-25-2018

hi,

its non browsable from 3 mentioned connections.

also it acts weirdly some times it browse the datastore but i cant upload or delete objects/folders

from ssh it gives this sometimes :

[root@Mercury:/vmfs/volumes/vsan:521873dee2c9844d-202591219df9317b] ls

ls: ./0d69d85a-521c-f0e1-7232-d06726ce446a: Connection timed out

ls: ./W12R2: Device or resource busy

OMN-BEI-PSC da4ee05a-48ff-1063-2ea7-d06726ce446a

OMN-BEI-VC e44cd75a-2806-5d72-6fa6-d06726ce91aa

a544d75a-cec4-42b1-c3d0-d06726ce91aa iso

[root@Venus:~] localcli vsan cluster get

Cluster Information:

Enabled: true

Current Local Time: 2018-04-25T10:23:05Z

Local Node UUID: 5ad83d66-78a3-4a3c-a9c3-d06726ce9576

Local Node Type: NORMAL

Local Node State: AGENT

Local Node Health State: HEALTHY

Sub-Cluster Master UUID: 5ad73439-e272-bb5e-f9b2-d06726ce91aa

Sub-Cluster Backup UUID: 5ad83cbe-cfc8-9724-3da9-d06726ce4a82

Sub-Cluster UUID: 521873de-e2c9-844d-2025-91219df9317b

Sub-Cluster Membership Entry Revision: 58

Sub-Cluster Member Count: 5

Sub-Cluster Member UUIDs: 5ad73439-e272-bb5e-f9b2-d06726ce91aa, 5ad83cbe-cfc8-9724-3da9-d06726ce4a82, 5ad83d66-78a3-4a3c-a9c3-d06726ce9576, 5ad9c06e-c8bc-8896-ddbd-d06726ce446a, 5ad84c7e-ff80-7499-a7b0-005056842a39

Sub-Cluster Membership UUID: 4e44d75a-6048-216a-cd15-d06726ce91aa

Unicast Mode Enabled: true

Maintenance Mode State: OFF

Config Generation: eae14a71-c75d-4e02-afef-1d0436a05499 10 2018-04-20T11:09:21.959

TheBobkin · ‎04-25-2018

Hello @ahmad090,

"Sub-Cluster Membership Entry Revision: 58"

This increments every time a host is added or removed from the cluster, potentially connection between the cluster members is flapping.

Can you please check the clomd.log for the following entries which will indicate if members are being added/removed from cluster membership:

# less /var/log/clomd.log| grep CdbObjectNode

What warnings/errors do you see in Health checks via the Web Client as advised in my previous comment?

Bob

ahmad090 · ‎04-25-2018

hi

no output from this command

less /var/log/clomd.log| grep CdbObjectNode

from health check only data error for some orphaned vms are not protected (doesnt matter)

every other helath is okay

vDS health check is enabled and ok

pings between vmkernels (vsan, vmotion) is okay

mtu 9000 is used .

TheBobkin · ‎04-25-2018

Hello ahmad090,

Can you test creating a small test VM (with Default vSAN Storage Policy) and then after this retrieve and attach the vmkernel.log and clomd.log from a host?

All hosts are out of Maintenance Mode? What is the output from this run on a host?:

# cmmds-tool find -t NODE_DECOM_STATE -f json

Is the Witness VM on the same build as all the data-nodes in the cluster?

Bob

ahmad090 · ‎04-25-2018

ill send you the requested logs.

meanwhile check the output:

[root@Venus:~] cmmds-tool find -t NODE_DECOM_STATE -f json

{

"entries":

[

{

"uuid": "5ad83cbe-cfc8-9724-3da9-d06726ce4a82",

"owner": "5ad83cbe-cfc8-9724-3da9-d06726ce4a82",

"health": "Healthy",

"revision": "21",

"type": "NODE_DECOM_STATE",

"flag": "2",

"minHostVersion": "0",

"md5sum": "3c2593056659ee3c9e97039a3eefea8e",

"valueLen": "80",

"content": {"decomState": 0, "decomJobType": 0, "decomJobUuid": "00000000-000 0-0000-0000-000000000000", "progress": 0, "affObjList": [ ], "errorCode": 0, "up dateNum": 0, "majorVersion": 0},

"errorStr": "(null)"

}

,{

"uuid": "5ad83d66-78a3-4a3c-a9c3-d06726ce9576",

"owner": "5ad83d66-78a3-4a3c-a9c3-d06726ce9576",

"health": "Healthy",

"revision": "18",

"type": "NODE_DECOM_STATE",

"flag": "2",

"minHostVersion": "0",

"md5sum": "3c2593056659ee3c9e97039a3eefea8e",

"valueLen": "80",

"content": {"decomState": 0, "decomJobType": 0, "decomJobUuid": "00000000-000 0-0000-0000-000000000000", "progress": 0, "affObjList": [ ], "errorCode": 0, "up dateNum": 0, "majorVersion": 0},

"errorStr": "(null)"

}

,{

"uuid": "5ad73439-e272-bb5e-f9b2-d06726ce91aa",

"owner": "5ad73439-e272-bb5e-f9b2-d06726ce91aa",

"health": "Healthy",

"revision": "22",

"type": "NODE_DECOM_STATE",

"flag": "2",

"minHostVersion": "0",

"md5sum": "3c2593056659ee3c9e97039a3eefea8e",

"valueLen": "80",

"content": {"decomState": 0, "decomJobType": 0, "decomJobUuid": "00000000-000 0-0000-0000-000000000000", "progress": 0, "affObjList": [ ], "errorCode": 0, "up dateNum": 0, "majorVersion": 0},

"errorStr": "(null)"

}

,{

"uuid": "5ad9c06e-c8bc-8896-ddbd-d06726ce446a",

"owner": "5ad9c06e-c8bc-8896-ddbd-d06726ce446a",

"health": "Healthy",

"revision": "15",

"type": "NODE_DECOM_STATE",

"flag": "2",

"minHostVersion": "0",

"md5sum": "3c2593056659ee3c9e97039a3eefea8e",

"valueLen": "80",

"content": {"decomState": 0, "decomJobType": 0, "decomJobUuid": "00000000-000 0-0000-0000-000000000000", "progress": 0, "affObjList": [ ], "errorCode": 0, "up dateNum": 0, "majorVersion": 0},

"errorStr": "(null)"

}

,{

"uuid": "5ad84c7e-ff80-7499-a7b0-005056842a39",

"owner": "5ad84c7e-ff80-7499-a7b0-005056842a39",

"health": "Healthy",

"revision": "0",

"type": "NODE_DECOM_STATE",

"flag": "2",

"minHostVersion": "0",

"md5sum": "3c2593056659ee3c9e97039a3eefea8e",

"valueLen": "80",

"content": {"decomState": 0, "decomJobType": 0, "decomJobUuid": "00000000-000 0-0000-0000-000000000000", "progress": 0, "affObjList": [ ], "errorCode": 0, "up dateNum": 0, "majorVersion": 0},

"errorStr": "(null)"

}

,{

"uuid": "5ad75a85-339f-a3fe-4b41-d06726ce446a",

"owner": "5ad75a85-339f-a3fe-4b41-d06726ce446a",

"health": "Unhealthy",

"revision": "1",

"type": "NODE_DECOM_STATE",

"flag": "0",

"minHostVersion": "0",

"md5sum": "3c2593056659ee3c9e97039a3eefea8e",

"valueLen": "80",

"content": {"decomState": 0, "decomJobType": 0, "decomJobUuid": "00000000-000 0-0000-0000-000000000000", "progress": 0, "affObjList": [ ], "errorCode": 0, "up dateNum": 0, "majorVersion": 0},

"errorStr": "(null)"

}

]

}

[root@Venus:~]

ahmad090 · ‎04-25-2018

this error occurred while creating the VM

and attached are requested logs.

Cannot complete file creation operation.

Operation failed, diagnostics report: Failed to create directory TESTVM-2 (Cannot Create File)

TheBobkin · ‎04-25-2018

Hello ahmad090,

Nothing definitive in the logs but there does appear to be something strange going on in relation that and another iso:

2018-04-25T09:55:43.317Z cpu27:67991 opID=ee17b019)WARNING: com.vmware.vmklinkmpi: VmklinkMPI_CallSync:1303: No response received for message 0x145 on osfs-vmklink (wait status Timeout)

2018-04-25T09:55:43.317Z cpu27:67991 opID=ee17b019)osfs: OSFSVmklinkCall:231: vmklink call failed with: Timeout

2018-04-25T09:55:43.317Z cpu27:67991 opID=ee17b019)osfs: OSFS_VmklinkLookup:479: Error making Lookup VmklinkCall

2018-04-25T09:55:43.317Z cpu27:67991 opID=ee17b019)osfs: OSFS_Lookup:2579: Lookup error: file = W12R2, status = Timeout

2018-04-25T09:55:43.319Z cpu24:1815742 opID=bcf9386)WARNING: VSAN: Vsan_OpenDevice:1052: Failed to open VSAN device '0d69d85a-521c-f0e1-7232-d06726ce446a' with DevLib: Busy

2018-04-25T09:55:43.319Z cpu24:1815742 opID=bcf9386)Vol3: 2538: Could not open device '0d69d85a-521c-f0e1-7232-d06726ce446a' for probing: Busy

2018-04-25T10:37:50.409Z cpu5:65791)FS3J: 3035: Aborting txn (0x4308160b4540) callerID: 0xc1d00006 due to failure pre-committing: Lost previously held disk lock

2018-04-25T10:37:50.409Z cpu5:65791)BC: 5033: Failed to flush 128 buffers of size 8192 each for object 'VMware-ESXi-6.5.0-Update1-7388607-HPE-650.U1.10.2.0.23-Feb2018.iso' f530 28 3 5ae04eda 9b8666a0 67d01586 6a44ce26 400cc4 2 0 0 0 0 0: Busy

2018-04-25T10:37:50.413Z cpu5:65791)FS3J: 3035: Aborting txn (0x4308160b4540) callerID: 0xc1d00006 due to failure pre-committing: Lost previously held disk lock

2018-04-25T10:37:50.413Z cpu5:65791)BC: 5033: Failed to flush 128 buffers of size 8192 each for object 'VMware-ESXi-6.5.0-Update1-7388607-HPE-650.U1.10.2.0.23-Feb2018.iso' f530 28 3 5ae04eda 9b8666a0 67d01586 6a44ce26 400cc4 2 0 0 0 0 0: Busy

2018-04-25T10:37:50.417Z cpu5:65791)FS3J: 3035: Aborting txn (0x4308160b4540) callerID: 0xc1d00006 due to failure pre-committing: Lost previously held disk lock

2018-04-25T10:37:50.417Z cpu5:65791)BC: 5033: Failed to flush 91 buffers of size 8192 each for object 'VMware-ESXi-6.5.0-Update1-7388607-HPE-650.U1.10.2.0.23-Feb2018.iso' f530 28 3 5ae04eda 9b8666a0 67d01586 6a44ce26 400cc4 2 0 0 0 0 0: Busy

2018-04-25T10:37:50.420Z cpu5:65791)FS3J: 3035: Aborting txn (0x4308160b4540) callerID: 0xc1d00006 due to failure pre-committing: Lost previously held disk lock

2018-04-25T10:37:50.420Z cpu5:65791)BC: 5033: Failed to flush 26 buffers of size 8192 each for object 'VMware-ESXi-6.5.0-Update1-7388607-HPE-650.U1.10.2.0.23-Feb2018.iso' f530 28 3 5ae04eda 9b8666a0 67d01586 6a44ce26 400cc4 2 0 0 0 0 0: Busy

Did you upload these isos directly to the vsandatastore as opposed to into a namespace folder?

If you know the exact name of the files you could try using rm to remove it without first doing 'ls' at the datastore level.

Alternatively you could determine the UUID of the Objects and you could delete them but of course only do this with extreme caution (assuming they are being stored as Objects and not just stored as data like -flats are).

If you have a S&S for this cluster I would advise you open a Support Request so that one of my colleagues can take a better look at this cluster.

Edit: 2nd look at logs.

Bob

ahmad090 · ‎05-07-2018

hi,

the problem was solved.

when i changed the vDS and vmkernels to MTU=1500 , vSAN worked successfully.

then we noticed that on the physical switches the jumbo frame was 9000 which should be 9216.

when we changed the MTU on physical switches to 9216 the i reverted the vDS and VMkernels to 9000, vSAN worked properly.

thanks,

All

vSAN datastore not browsable after uploading ISO file