CMDTeam
Contributor
Contributor

Datastore dissappeared (Probably because of update 6.5 to 6.7)

Hi Everyone,

We have just noticed that our datastore is missing and we cannot see it under the ESXi webpage.

Not sure when did this happen, but maybe after we have updated the Esxi to 6.7.

The VMs are running fine.

We can see the Adapter under Storage -> Adapters.

We can see the hard drive under Storage -> Devices.

But the Datastore is empty and we cannot change any settings of the VMs as it throws an error

When I try to browse: "There are no browsable datastores. Currently, only vSAN and local VMFS datastores can be browsed."

I have tried to figure it out and run a few commands under SSH, but I cannot see what could be the issue or how to fix it.

Could you please let me know how to get back the datastore?

esxcli storage vmfs extent list

    Volume Name  VMFS UUID                            Extent Number  Device Name                           Partition

    -----------  -----------------------------------  -------------  ------------------------------------  ---------

    datastore1   5c8a7789-1a4c3a5c-0a47-b083fec42bdd              0  naa.6b083fe0cfb8ba002400ee74143fdef7          3

esxcfg-mpath -l

    sas.5b083fe0cfb8ba00-sas.6000ee74143fdef7-naa.6b083fe0cfb8ba002400ee74143fdef7

       Runtime Name: vmhba0:C2:T0:L0

       Device: naa.6b083fe0cfb8ba002400ee74143fdef7

       Device Display Name: Local DELL Disk (naa.6b083fe0cfb8ba002400ee74143fdef7)

       Adapter: vmhba0 Channel: 2 Target: 0 LUN: 0

       Adapter Identifier: sas.5b083fe0cfb8ba00

       Target Identifier: sas.6000ee74143fdef7

       Plugin: NMP

       State: active

       Transport: sas

       Adapter Transport Details: 5b083fe0cfb8ba00

       Target Transport Details: 6000ee74143fdef7

esxcfg-scsidevs -c

    Device UID                            Device Type      Console Device                                            Size      Multipath PluginDisplay Name

    naa.6b083fe0cfb8ba002400ee74143fdef7  Direct-Access    /vmfs/devices/disks/naa.6b083fe0cfb8ba002400ee74143fdef7  3814400MB NMP     Local DELL Disk (naa.6b083fe0cfb8ba002400ee74143fdef7)

esxcli storage vmfs snapshot list

     This returns nothing...

esxcli system coredump partition list

    Name                                    Path                                                        Active  Configured

    --------------------------------------  ----------------------------------------------------------  ------  ----------

    naa.6b083fe0cfb8ba002400ee74143fdef7:7  /vmfs/devices/disks/naa.6b083fe0cfb8ba002400ee74143fdef7:7   false       false

    naa.6b083fe0cfb8ba002400ee74143fdef7:9  /vmfs/devices/disks/naa.6b083fe0cfb8ba002400ee74143fdef7:9    true        true

partedUtil getptbl /vmfs/devices/disks/naa.6b083fe0cfb8ba002400ee74143fdef7

    gpt

    486267 255 63 7811891200

    1 64 8191 C12A7328F81F11D2BA4B00A0C93EC93B systemPartition 128

    5 8224 520191 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0

    6 520224 1032191 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0

    7 1032224 1257471 9D27538040AD11DBBF97000C2911D1B8 vmkDiagnostic 0

    8 1257504 1843199 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0

    9 1843200 7086079 9D27538040AD11DBBF97000C2911D1B8 vmkDiagnostic 0

    2 7086080 15472639 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0

    3 15472640 7811891166 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

If I run a rescan on the device the kernel log shows this:

2019-04-11T16:47:15.591Z cpu6:2099152 opID=b3571b7d)World: 11943: VC opID 24fae6e5 maps to vmkernel opID b3571b7d

2019-04-11T16:47:15.591Z cpu6:2099152 opID=b3571b7d)NVDManagement: 1461: No nvdimms found on the system

2019-04-11T16:47:25.437Z cpu2:2097756)ScsiDeviceIO: 3068: Cmd(0x459a41156c40) 0x1a, CmdSN 0x11be from world 0 to dev "naa.6b083fe0cfb8ba002400ee74143fdef7" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.

2019-04-11T16:47:25.466Z cpu2:2099150 opID=bf5b1e18)World: 11943: VC opID 24fae6ee maps to vmkernel opID bf5b1e18

2019-04-11T16:47:25.466Z cpu2:2099150 opID=bf5b1e18)VC: 4616: Device rescan time 16 msec (total number of devices 5)

2019-04-11T16:47:25.466Z cpu2:2099150 opID=bf5b1e18)VC: 4619: Filesystem probe time 30 msec (devices probed 3 of 5)

2019-04-11T16:47:25.466Z cpu2:2099150 opID=bf5b1e18)VC: 4621: Refresh open volume time 1 msec

0 Kudos
5 Replies
continuum
Immortal
Immortal

> But the Datastore is empty and we cannot change any settings of the VMs as it throws an error
Just to be sure ... what happens if you use ssh and try to
cd /vmfs/volumes/datastore1
At the moment I would assume that all further attempts to do anything via GUI or commandline could make matters worse.
When can you schedule a reboot of the ESXi ?
I would also try to backup all running VMs - if that is possible.
Also create a VMFS header-dump. That will help to recover in case the VMFS datastore is too damaged to mount again after a reboot.
See Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay

Please attach the complete vmkernel.log to your next reply.
Ulli

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos
cbmadmin
Contributor
Contributor

We had a similar problem recently post-upgrade from v6.5 to v6.7 in that we lost access to about 1/2 of the datastores on our SAN.  This affected not only the host that we upgraded, but also the other 2 hosts in the cluster which were still running v6.5.  Due to path failover (lost Active paths failed over to Standby paths), we didn't discover the extent of the problem until it was too late.  Hosts ended up 24 hours later showing as Not Responding in vCenter, with the VMs still running but showing as Disconnected.

So if you're using Direct Attached Storage with your host, then maybe the fault is similar?  If you're a VSAN environment, then ignore what I'm about to say.  Either way, talk to VMware and get help quickly.

-------------------- what happened in our environment follows:

VMware advised that the cause was a storage fault, as evidenced by storage controller errors in the vSphere logs ("state: dead sas") and that we needed to raise a ticket with our storage vendor (Dell EMC).

Dell EMC investigated and identified a likely root cause as being a change in behaviour of a key storage driver (lsi_msgpt3) which changed between vSphere v6.5 and v6.7 and which required a configuration change in the parameters of the driver. 

This configuration requirement is documented in knowledgebase articles both from Dell EMC (https://www.dell.com/support/article/US/en/19/sln313031/sc-storage-customer-notification-driver-comp...) published 21/1/2019 and also VMware (https://kb.vmware.com/s/article/67032) published  9/2/2019). 

Again, TALK TO VMWARE FIRST is my suggestion; don't just change stuff unless you're sure of the diagnosis!

We needed to run the following commands on each host via CLI:

esxcfg-scsidevs -a

  (confirm that lsi_msgpt3 is the kernel driver)

vmkload_mod -s lsi_msgpt3 |grep Version

  (check that the driver version is greater than 12)

esxcli system module parameters list -m lsi_msgpt3

  (check that issue_scsi_cmd_to_bringup_drive is not set / n.b. default=1)

esxcli system module parameters set -p issue_scsi_cmd_to_bringup_drive=0 -m lsi_msgpt3

  (set issue_scsi_cmd_to_bringup_drive = 0)

esxcli system module parameters list -m lsi_msgpt3

  (check that issue_scsi_cmd_to_bringup_drive is now = 0)

reboot

  (restart the host, then manually restart workloads)

AGAIN, I cannot emphasise this more strongly, DON'T CHANGE unless you are sure of the diagnosis.

Hope this helps.

0 Kudos
cbmadmin
Contributor
Contributor

Hmmm, after looking at your command outputs again, I'm not convinced it is the same issue.

The VMware article highlights in red that the storage shows as 0MB.

Whereas yours shows a value greater than zero.

So maybe not the same thing, although it does seem to be similar in some respects.

0 Kudos
CMDTeam
Contributor
Contributor

Hi Continuum,

Thanks for your help, I will try to answer everything one by one.

[peter@IS-77752:~] cd /vmfs/volumes/datastore1

[peter@IS-77752:/vmfs/volumes/5c8a7789-1a4c3a5c-0a47-b083fec42bdd]

And if I run "ls -l" then it shows all the folders with the VMs.

> When can you schedule a reboot of the ESXi ?

I am going to reboot it today, but I am not sure if it would solve it. I will let you know.

> I would also try to backup all running VMs - if that is possible.

I am thinking about, as a work around, if it is possible, to create a new data store and move the VMs to the new one.

Remove them and add them again from the new datastore. I am not 100% sure how to do it or if it is even possible though...

> Also create a VMFS header-dump. That will help to recover in case the VMFS datastore is too damaged to mount again after a reboot.

I have tried, but there is not enough space there even if I use gzip.

And I do not have any other datastore to use...

> Please attach the complete vmkernel.log to your next reply.

See attached.

Thanks.

0 Kudos
CMDTeam
Contributor
Contributor

UPDATE:
So I was thinking and realised after we have updated the to 6.7 we had the datastore for a while and it disappeared after a few days.

And now we have noticed something really strange...

If we log in to ESXi with a different user, not the default root user, then we can see the datastore and everything seems to be working fine.

It seems like the issue is with the root user and not with the datastore itself.

Do you know about an issue like this?

0 Kudos