VMware Cloud Community
LeslieBNS9
Enthusiast
Enthusiast

Can't unmount datastore(s) - file system is busy error even though it's not in use

We are going through doing some cleanup on our datastores. So we are sVmotioning VMs off some datastores then deleting the LUNs.

For every datastore we are cleaning up there about 3 out of 16 hosts won't unmount the datastore. It is not always the same 3 hosts it varies.

The error that is occurring indicates that the file system is busy.

Cannot unmount volume 'Datastore Name: Hosted_FC_LUN70 VMFS uuid: 5202965c-e09a124d-9e1b-001a64c85540' because file system is busy. Correct the problem and retry the operation.

But there are not anything on the datastore in use. All VMs have been migrated off, the datastore is not used for heartbeat, vsantraced is disabled, SIOC is disabled, it is not apart of a datastore cluster, and the coredumps have all been cleared.

/vmfs/volumes/5202965c-e09a124d-9e1b-001a64c85540 # ls -allh

total 799744

drwxr-xr-t    1 root     root        1.2K Sep 10 15:04 .

drwxr-xr-x    1 root     root         512 Sep 10 15:12 ..

-r--------    1 root     root       10.3M Aug  7  2013 .fbb.sf

-r--------    1 root     root      254.7M Aug  7  2013 .fdc.sf

-rwxr-xr-x    1 root     root        1.0M Aug 14  2013 .iormstats.sf

-r--------    1 root     root        1.1M Aug  7  2013 .pb2.sf

-r--------    1 root     root      256.0M Aug  7  2013 .pbc.sf

-r--------    1 root     root      250.6M Aug  7  2013 .sbc.sf

-r--------    1 root     root        4.0M Aug  7  2013 .vh.sf

I also ran lsof and could not find any files in use on that LUN. I have also tried unmounting them from the command line and by connecting to the hosts directly with the vsphere client.

Right now the only resolution has been to reboot the hosts that are having a problem. Although this is getting to be quite annoying considering I have 30 datastores to cleanup and cannot continue on the next datastore until the one I'm working on is completely unmounted and then re-purposed.

We also have very strict rules regarding host reboots and they can only be performed in the evening. So I am looking at a month before all the cleanup can be finished in our current state.

These datastores I'm cleaning up are either VMFS3 or VMFS 5.54.

Does anyone have any ideas on why these won't unmount?

19 Replies
continuum
Immortal
Immortal

maybe this helps http://kb.vmware.com/kb/2011220


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
LeslieBNS9
Enthusiast
Enthusiast

I tried that and it still wouldn't let me unmount. Same error message. I also have SIOC turned off on the datastore.

Reply
0 Kudos
LeslieBNS9
Enthusiast
Enthusiast

Working with VMWare support we were able to list out the processes that are hanging onto that datastore.

~ # vsish -e ls /storage/scsifw/devices/naa.6005076802850d082800000000000005/worlds/ |sed 's:/::' |while read i;do ps |grep $i;done

32789      idle0

33601      helper48-2

32901      OCFlush

33602      helper48-3

33609      helper48-10

33603      helper48-4

33605      helper48-6

32903      BCFlush-0

32885      helper24-0

33600      helper48-1

33612      helper48-13

33610      helper48-11

33607      helper48-8

32840      helper14-0

33614      helper48-15

33599      helper48-0

32841      helper14-1

33613      helper48-14

33606      helper48-7

33480      helper43-0

33608      helper48-9

33611      helper48-12

33604      helper48-5

32842      helper14-2

32824      helper3-1

32823      helper3-0

32904      helper26-0

~ #

The problem is these are all zombie processes and cannot be killed

~ # kill 33601

sh: can't kill pid 33601: No such process

~ #

The only way to clear them out is to reboot the host. Does anyone have any insight into why we would be getting so many zombie processes? Every time I want to remove a datastore we are running into this problem.

Reply
0 Kudos
JPM300
Commander
Commander

Hey LeslieBNS9

I know helper processes can spawn when a process is taking to long.  Are these old helper processes from when you where troubleshooting the network issue inside your c7000 series or are these new?

VMware KB: One host shows a Storage Initiator Error while all other hosts show SCSI Reservation Conf...

LeslieBNS9
Enthusiast
Enthusiast

I'm not sure they are related to the C7000 troubleshooting. But now that I think about it I wonder if they are a result of our vCenter crashing while taking VDP backups. Everytime vCenter crashes while the backups are running we see problematic checkpoint files (ctk) and many VMs that need consolidation. It seems that after I clean out these datastores the only files left are orphan ctk and snapshot files.

Reply
0 Kudos
JPM300
Commander
Commander

I can't say for certian but its deffiently possible.  Does the problem happen on datastores where you don't have these ctk files and orphan snapshots?

Reply
0 Kudos
LeslieBNS9
Enthusiast
Enthusiast

We have them on all of our datastores at this point. But at their inception I am wondering if that's when the helper process is spawned. I think the unmounting the datastore problem may just be a symptom of the vcenter crashes that then cause all these other problems we've seen. In theory I should be able to reboot the hosts and as long as vcenter doesn't crash the unmounting problem won't occur again. Who knows. I guess only time will tell.

Reply
0 Kudos
JarryG
Expert
Expert

Yes, it is. But not the way you tried!

_____________________________________________ If you found my answer useful please do *not* mark it as "correct" or "helpful". It is hard to pretend being noob with all those points! 😉
Reply
0 Kudos
Notch201110141
Contributor
Contributor

Services restart fixed this for me.

services.sh restart

Then was able to unmount using

esxcli storage filesystem unmount -l LUNXX

Reply
0 Kudos
EdZ314
Enthusiast
Enthusiast

We've had the same problem several times as well. Restarting the agents as suggested in one of the replies worked in a couple cases - others required a host reboot. We're not using any special backup agents or anything like that - pretty basic ESXi and vCenter configuration with vCOPs and built-in tools. It gets quite frustrating. We're starting to turn off SIOC now - we suspect that was related to some other issues we were seeing, and I have my suspicions that it is related to this as well.

Reply
0 Kudos
mikejroberts
Enthusiast
Enthusiast

Same issue here and we ended up having to reboot 19 hosts.  Support was not able to find a solution, but said it should be resolved in 5.5 U2.  I cannot confirm that it is resolved, but I can confirm that the only thing that worked was a reboot.

Reply
0 Kudos
LeslieBNS9
Enthusiast
Enthusiast

We were never able to narrow down the root cause. It wasn't due to vcenter crashes from VDP or anything else that I can discern. We just bit the bullet and reboot our hosts every time this happens.

Reply
0 Kudos
NelsonNetto
Contributor
Contributor

I used services.sh and it worked for me.

There's the VMware KB: Unmounting a datastore used for Host Cache on ESXi 5.5 fails with the error: Cannot unmou... for reference.

Maybe you should consider check it.

mikejroberts
Enthusiast
Enthusiast

Unfortunately, it didn't work in our case.  I cannot tell you how many different things we (VMware Support and I) tried, but the end result required killing VMs or rebooting hosts.

Reply
0 Kudos
jbartlett1
Contributor
Contributor

Searching the WEB and finding this helped us get our volume unmounted without a reboot of the ESX server.  I thought I would share what we did, it may help someone else with similar problem. 

After a hardware failure of our iSCSI SAN we were unable to unmount datastore from ESX host.  However we were able to 'kill' the parent processes that were hanging onto the datastore and rescan Storage Adapters which unmounted and removed the datastore from the ESX host.  Some of the processes were from machines that were attached to the storage at the time of the SAN failure but no longer existed on the server and 'kill'ing those processes was key to getting ESX to unmount the datastore.

~ # esxcli storage filesystem list

Mount Point                                        Volume Name               UUID                                 Mounted Type    Size                 Free

-------------------------------------------------  ------------------------  -----------------------------------  ------- ------  -------------------  -------------------

/vmfs/volumes/4cf65020-cdd99510-1fd7-842b2b05c775  esxisrv_localdisk         4cf65020-cdd99510-1fd7-842b2b05c775     true VMFS-5         140660178944         139948195840

/vmfs/volumes/51646b41-f9944c94-40a8-001018846202  Secondary_Raid_10_7k_vd5  51646b41-f9944c94-40a8-001018846202     true VMFS-3        1395595935744         860713123840

/vmfs/volumes/4f7f2302-d03c7288-67d4-00101884620a  Raid_5_15k_vd1            4f7f2302-d03c7288-67d4-00101884620a     true VMFS-3  7021786302599163750  1248136196516808178

/vmfs/volumes/4d7fc952-1264e9f8-07ff-001018846070  Utility Storage           4d7fc952-1264e9f8-07ff-001018846070     true VMFS-5         999385202688         676720541696

~ # esxcli storage filesystem unmount -p /vmfs/volumes/4f7f2302-d03c7288-67d4-00101884620a

Volume '/vmfs/volumes/4f7f2302-d03c7288-67d4-00101884620a' cannot be unmounted. Reason: Busy

~ # esxcfg-scsidevs –m

naa.6842b2b01c2d4300146d3b1e08686e34:3                  /vmfs/devices/disks/naa.6842b2b01c2d4300146d3b1e08686e34:3 4cf65020-cdd99510-1fd7-842b2b05c775 0  esxisrv_localdisk

naa.690b11c000377c4e000001ff51649fa5:1                  /vmfs/devices/disks/naa.690b11c000377c4e000001ff51649fa5:1 51646b41-f9944c94-40a8-001018846202 0  Secondary_Raid_10_7k_vd5

naa.6842b2b00066caec000084bb4f7e9e4e:1                   :1 4f7f2302-d03c7288-67d4-00101884620a 0  Raid_5_15k_vd1

naa.6842b2b00066caec00000da24d7f5fac:1                  /vmfs/devices/disks/naa.6842b2b00066caec00000da24d7f5fac:1 4d7fc952-1264e9f8-07ff-001018846070 0  Utility Storage

~ # vsish -e ls /storage/scsifw/devices/naa.6842b2b00066caec000084bb4f7e9e4e/worlds/ |sed 's:/::' |while read i;do ps |grep $i;done

8216      idle0              

8881      helper41-4         

8291      OCFlush            

8878      helper41-1

8891      helper41-14        

11326 11326 vmx                  /bin/vmx

11333 11326 vmx-vthread-7:termsrv09 /bin/vmx

12256 11326 vmx-svga:termsrv09   /bin/vmx

12258 11326 vmx-vcpu-1:termsrv09 /bin/vmx

12259 11326 vmx-vcpu-2:termsrv09 /bin/vmx

12260 11326 vmx-vcpu-3:termsrv09 /bin/vmx

8293      BCFlush            

11333 11326 vmx-vthread-7:termsrv09 /bin/vmx

11327      vmm0:termsrv09     

11329      vmast.11327        

12257 11326 vmx-vcpu-0:termsrv09 /bin/vmx

12258 11326 vmx-vcpu-1:termsrv09 /bin/vmx

8267      helper19-0         

12259 11326 vmx-vcpu-2:termsrv09 /bin/vmx

8883      helper41-6         

8882      helper41-5         

8879      helper41-2         

8890      helper41-13        

3018890 3018867 hostd-worker         hostd

8887      helper41-10        

8886      helper41-9         

8877      helper41-0         

1938877 1946811 vmx-vcpu-0:srv-app /bin/vmx

8892      helper41-15        

3018892 3018892 nssquery /usr/libexec/hostd/nssquery

8880      helper41-3         

8884      helper41-7         

8889      helper41-12        

3018889 3018867 hostd-worker         hostd

8244      helper13-2         

3018244 3018240 sfcb-vmware_aux      /sbin/sfcbd

8888      helper41-11        

3018888 3018867 hostd-poll           hostd

8242      helper13-0         

3018242 3018240 sfcb-vmware_aux      /sbin/sfcbd

12260 11326 vmx-vcpu-3:termsrv09 /bin/vmx

8228      helper3-1          

82287 82064 vmx-vthread-6:kvchecho /bin/vmx

82288 82064 vmx-mks:srvecho     /bin/vmx

82289 82064 vmx-svga:srvecho    /bin/vmx

8227      helper3-0          

8229      helper3-2          

82290 82064 vmx-vcpu-0:srvecho  /bin/vmx

82291 82064 vmx-vcpu-1:srvhecho  /bin/vmx

3018229 3018229 openwsmand /sbin/openwsmand

3018230 3018229 openwsmand /sbin/openwsmand

3018231 3018229 openwsmand /sbin/openwsmand

8863      helper39-0         

8243      helper13-1        

12255 11326 vmx-mks:termsrv09    /bin/vmx

~ # kill 11326

~ # kill 11327

~ # vsish -e ls /storage/scsifw/devices/naa.6842b2b00066caec000084bb4f7e9e4e/worlds/ |sed 's:/::' |while read i;do ps |grep $i;done

8216      idle0              

8881      helper41-4         

8291      OCFlush            

8878      helper41-1         

8885      helper41-8         

8891      helper41-14        

8293      BCFlush            

8267      helper19-0         

8883      helper41-6         

8882      helper41-5         

8879      helper41-2         

8890      helper41-13        

8887      helper41-10       

8886      helper41-9  

8877      helper41-0         

1938877 1946811 vmx-vcpu-0:srv-app /bin/vmx

8892      helper41-15        

8880      helper41-3         

8884      helper41-7         

8889      helper41-12        

8244      helper13-2         

8888      helper41-11        

8242      helper13-0         

8228      helper3-1          

8227      helper3-0          

8229      helper3-2          

8863      helper39-0         

8243      helper13-1

~ # esxcli storage core adapter rescan –all

~ # esxcli storage filesystem list

Mount Point                                        Volume Name               UUID                                 Mounted Type    Size                 Free

-------------------------------------------------  ------------------------  -----------------------------------  ------- ------  -------------------  -------------------

/vmfs/volumes/4cf65020-cdd99510-1fd7-842b2b05c775  esxisrv_localdisk         4cf65020-cdd99510-1fd7-842b2b05c775     true VMFS-5         140660178944         139948195840

/vmfs/volumes/51646b41-f9944c94-40a8-001018846202  Secondary_Raid_10_7k_vd5  51646b41-f9944c94-40a8-001018846202     true VMFS-3        1395595935744         860713123840

/vmfs/volumes/4d7fc952-1264e9f8-07ff-001018846070  Utility Storage           4d7fc952-1264e9f8-07ff-001018846070     true VMFS-5         999385202688         676720541696

jgaleano
Enthusiast
Enthusiast

Thanks John, this was very helpful as we had the same error trying to unmount an ISCSI datastore.

Reply
0 Kudos
Groffskiii
Contributor
Contributor

I am pretty sure you just needed to edit your scratch config as it was using that datastore:

Creating a persistent scratch location for ESXi 4.x/5.x/6.x (1033696) | VMware KB

just set it to /tmp

or create a folder in tmp named scratch and set it to /tmp/scratch

Reboot and you are good to go.

Reply
0 Kudos