VMware Cloud Community
absdear
Contributor
Contributor

Datastore - Free size in minus values, and then suddenly in huge size.

Dear All,

We came across with a strange issue last month.

We have a vmware cluster with 2 esxi 4.1 hosts.

Friday morning, suddenly we lost connection to all VMs running on both esx hosts and lost connection to storage.

After 1st esx restart 3rd volume was still in disconnected state.

After esx host restart, and rescan, it showed free size in minus for 3rd datastore.

And bizarre issue was, suddenly after 3rd rescan it showed total capacity of 4000TB, and free size of 293 TB. Entire storagebox is not bigger than 8 TB.

This situation remained till friday, and suddenly after another rescan on Monday, the datastore showed again correct size and issue was completely solved and did not occur till now.

We are still wondering what went wrong for 3 days.

There were zero errors at storage level or at hardware level.

Strange issue which got solved itself!!!! Has anyone seen this in their environment?

test - Copy.PNG

We saw this in logs.

Dec  6 03:25:46 vmkernel: 0:15:36:09.307 cpu0:4191)WARNING: ScsiPath: 3248: Failed to issue command 0xa3 on path vmhba3:C0:T1:L3 (World 0): Timeout

Dec  6 03:25:46 vmkernel: 0:15:36:09.307 cpu0:4191)VMW_SATP_ALUA: satp_alua_issueCommandOnPath: Failed to issue sync command on path "vmhba3:C0:T1:L3" (DOWN) due to status Timeout.

0 Kudos
2 Replies
vfk
Expert
Expert

I suspect you have corrupt volume, the last time I have seen this volume run out of space and metadata corrupted.  I would recommend contacting VMware support to investigate this further.

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP
Cyberfed27
Hot Shot
Hot Shot

We just had a very similar issue on our system. We are running 5.1 ESXi and 5.5 vCenter.

We added a new 1.5 TB datastore to our VMware cluster. Only a few hosts were seeing the new datastore. After doing a bunch of refresh/rescan all we managed to get all the VMhosts to see the new datastore. Then things got really weird, the size was being reported erroneously from it being 4TB to over 290,000TB!!! It would jump into a red alarm and then disconnect from some hosts. On other hosts it showed up correctly as 1.5TB but with 700GB used! Refresh again and the datastore would disappear!

I checked on our SAN and there were no errors or messages to indicate any issue with the SAN, LUN, or communication paths.

We ended up creating a new LUN and datastore and just deleted the one causing the issues. So far everything seems fine. We have never seen this happen before. We too concluded that it may have been some type of corruption on the LUN or during the VMFS5 formatting. I know this doesn't offer much help in how to solve the issue. For us since it was a new datastore with nothing on it we opted to just delete it and move on rather than troubleshoot deeper with VMware.

0 Kudos