VMware Cloud Community
vds
Contributor
Contributor

ESXi loses size of /vmfs/volumes/vm

I have ESXI 5.5 update 1 running on a Intel NUC and I've seen it get stupid twice now in six weeks.  The symptoms are that the VMs are not accessible, although they show up as green/running in the Windows client.  At that time the other VMs that are known are greyed out as unavailable in the gui, and when I log into the ESXi host via SSH a df looks very strange with the /vmfs/volumes/vm entry showing zero bytes for all columns....

~ # df

Filesystem     Bytes      Used Available Use% Mounted on

VMFS-5             0         0         0   0% /vmfs/volumes/vm

vfat       261853184 165478400  96374784  63% /vmfs/volumes/6bd3fde8-6e085f5c-08fa-706744cb5db9

vfat       261853184 165498880  96354304  63% /vmfs/volumes/104e0cef-148f4ff6-92f4-23c5628c7b64

vfat       299712512 202006528  97705984  67% /vmfs/volumes/53f1e12f-31e9cdc4-de70-c03fd566c7a4

If I use the Window client to shut down the VMs and put the host into maintenance mode, I can reboot it ok.   When it comes back up all is fine, although I then need to get the host out of maintenance mode and restart the clients.  The 'df' looks normal again....

~ # df

Filesystem        Bytes         Used    Available Use% Mounted on

VMFS-5     536602476544 385139867648 151462608896  72% /vmfs/volumes/vm

vfat          261853184    165478400     96374784  63% /vmfs/volumes/6bd3fde8-6e085f5c-08fa-706744cb5db9

vfat          261853184    165498880     96354304  63% /vmfs/volumes/104e0cef-148f4ff6-92f4-23c5628c7b64

vfat          299712512    202006528     97705984  67% /vmfs/volumes/53f1e12f-31e9cdc4-de70-c03fd566c7a4

Any ideas (a) what is going on, (b) which logs to look for what in, and (c) how to prevent it ?

I have to admit being lost in all the vWhatever product buzzwords and strange cli interfaces and logfiles, so be nice 🙂

Tags (1)
Reply
0 Kudos
3 Replies
pratjain
VMware Employee
VMware Employee

Host is losing access to the VMFS datastore and possibly to the underlying storage.

Please attach the vmkernel logs from the host when the issue is seen.

Regards, PJ If you find this or any other answer useful please mark the answer as correct or helpful.
Reply
0 Kudos
vds
Contributor
Contributor

Here goes - grep out the zillion annoying "maps to vmkernel" lines and it gets interesting.

Note - the reboot was me manually rebooting the ESXi host.

Reply
0 Kudos
vuzzini
Enthusiast
Enthusiast

Hello vds,

I got a chance to review the vmkernel logs uploaded by you and noticed below messages:

vmkernel.log snippet

-----------------------------

VC opID hostd-d62d maps to vmkernel opID d900a9ad

2014-10-02T05:54:40.003Z cpu0:33932)World: 14296: VC opID hostd-e388 maps to vmkernel opID f4bc832b

2014-10-02T05:55:00.004Z cpu2:34461)World: 14296: VC opID hostd-d62d maps to vmkernel opID d900a9ad

2014-10-02T05:55:20.002Z cpu0:44713)World: 14296: VC opID hostd-501b maps to vmkernel opID 94a005aa

2014-10-02T05:56:00.003Z cpu0:33939)World: 14296: VC opID hostd-d62d maps to vmkernel opID d900a9ad

2014-10-02T05:56:20.004Z cpu0:44713)World: 14296: VC opID hostd-ed47 maps to vmkernel opID edabca69

2014-10-02T05:56:56.242Z cpu1:33386)<3>ata1.00: exception Emask 0x10 SAct 0x2 SErr 0x280100 action 0x6 frozen

2014-10-02T05:56:56.242Z cpu1:33386)<3>ata1.00: irq_stat 0x09000000, interface fatal error

2014-10-02T05:56:56.242Z cpu1:33386)<3>ata1: SError: { UnrecovData 10B8B BadCRC }

2014-10-02T05:56:56.242Z cpu1:33386)<3>ata1.00: cmd 60/20:08:dd:ec:fb/00:00:38:00:00/40 tag 1 ncq 16384 in

         res 40/00:0c:dd:ec:fb/00:00:38:00:00/40 Emask 0x10 (ATA bus error)

2014-10-02T05:56:56.242Z cpu1:33386)<3>ata1.00: status: { DRDY }

2014-10-02T05:56:56.242Z cpu1:33386)<6>ata1: hard resetting link

2014-10-02T05:57:00.003Z cpu0:33939)World: 14296: VC opID hostd-2335 maps to vmkernel opID 70fae8a4

2014-10-02T05:57:01.769Z cpu3:33386)<4>ata1: port is slow to respond, please be patient (Status 0x80)

2014-10-02T05:57:06.265Z cpu0:33386)<3>ata1: COMRESET failed (errno=-16)

2014-10-02T05:57:06.265Z cpu0:33386)<6>ata1: hard resetting link

2014-10-02T05:57:06.364Z cpu0:32785)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x2a (0x412e80873b40, 32777) to dev "t10.ATA_____WDC_WD10JFCX2D68N6GN0_________________________WD2DWX71A44E6438" on path "vmhba0:C0:T0:L0" Failed: H:0x5 D:0x0 P:0x0 Possible sen$

2014-10-02T05:57:06.365Z cpu0:32785)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "t10.ATA_____WDC_WD10JFCX2D68N6GN0_________________________WD2DWX71A44E6438" state in doubt; requested fast path state update...

2014-10-02T05:57:06.365Z cpu0:32785)ScsiDeviceIO: 2337: Cmd(0x412e80873b40) 0x2a, CmdSN 0xa0015 from world 32777 to dev "t10.ATA_____WDC_WD10JFCX2D68N6GN0_________________________WD2DWX71A44E6438" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x$

2014-10-02T05:57:06.768Z cpu2:33386)<6>ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

2014-10-02T05:57:06.770Z cpu2:33386)<6>ata1.00: configured for UDMA/133

2014-10-02T05:57:06.770Z cpu2:33386)<6>ata1: EH complete

2014-10-02T05:57:06.770Z cpu1:33274)ScsiDeviceIO: 2324: Cmd(0x412e80840f80) 0x28, CmdSN 0xa0013 from world 35470 to dev "t10.ATA_____WDC_WD10JFCX2D68N6GN0_________________________WD2DWX71A44E6438" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0xb 0x0 0x0.

2014-10-02T05:57:20.002Z cpu1:34461)World: 14296: VC opID hostd-d62d maps to vmkernel opID d900a9ad

2014-10-02T05:57:40.002Z cpu2:44713)World: 14296: VC opID hostd-06eb maps to vmkernel opID 646d9824

2014-10-02T05:57:51.404Z cpu2:33939)World: 14296: VC opID hostd-d62d maps to vmkernel opID d900a9ad

2014-10-02T05:58:00.003Z cpu0:33932)World: 14296: VC opID hostd-2335 maps to vmkernel opID 70fae8a4

2014-10-02T05:58:55.481Z cpu1:33932)World: 14296: VC opID hostd-2335 maps to vmkernel opID 70fae8a4

2014-10-02T05:59:00.002Z cpu2:33939)World: 14296: VC opID hostd-ed47 maps to vmkernel opID edabca69

2014-10-02T05:59:40.004Z cpu2:44713)World: 14296: VC opID hostd-b854 maps to vmkernel opID 44dad3cd

2014-10-02T05:59:51.410Z cpu0:33939)World: 14296: VC opID hostd-2335 maps to vmkernel opID 70fae8a4

2014-10-02T06:00:00.002Z cpu3:33932)

There are ATA bus errors, interface bus errors and COMRESET failures. This can be isolated to be an issue with the controller used on the server.  Also I would recommend to contact your hardware vendor and ask them to run a hardware diagnostic check on the server.

If you found this or any other answer useful please consider the use of the Helpful or Correct buttons to award points. Sandeep Vuzzini Sr. DevOps Engineer
Reply
0 Kudos