We are going through doing some cleanup on our datastores. So we are sVmotioning VMs off some datastores then deleting the LUNs.
For every datastore we are cleaning up there about 3 out of 16 hosts won't unmount the datastore. It is not always the same 3 hosts it varies.
The error that is occurring indicates that the file system is busy.
Cannot unmount volume 'Datastore Name: Hosted_FC_LUN70 VMFS uuid: 5202965c-e09a124d-9e1b-001a64c85540' because file system is busy. Correct the problem and retry the operation.
But there are not anything on the datastore in use. All VMs have been migrated off, the datastore is not used for heartbeat, vsantraced is disabled, SIOC is disabled, it is not apart of a datastore cluster, and the coredumps have all been cleared.
/vmfs/volumes/5202965c-e09a124d-9e1b-001a64c85540 # ls -allh
total 799744
drwxr-xr-t 1 root root 1.2K Sep 10 15:04 .
drwxr-xr-x 1 root root 512 Sep 10 15:12 ..
-r-------- 1 root root 10.3M Aug 7 2013 .fbb.sf
-r-------- 1 root root 254.7M Aug 7 2013 .fdc.sf
-rwxr-xr-x 1 root root 1.0M Aug 14 2013 .iormstats.sf
-r-------- 1 root root 1.1M Aug 7 2013 .pb2.sf
-r-------- 1 root root 256.0M Aug 7 2013 .pbc.sf
-r-------- 1 root root 250.6M Aug 7 2013 .sbc.sf
-r-------- 1 root root 4.0M Aug 7 2013 .vh.sf
I also ran lsof and could not find any files in use on that LUN. I have also tried unmounting them from the command line and by connecting to the hosts directly with the vsphere client.
Right now the only resolution has been to reboot the hosts that are having a problem. Although this is getting to be quite annoying considering I have 30 datastores to cleanup and cannot continue on the next datastore until the one I'm working on is completely unmounted and then re-purposed.
We also have very strict rules regarding host reboots and they can only be performed in the evening. So I am looking at a month before all the cleanup can be finished in our current state.
These datastores I'm cleaning up are either VMFS3 or VMFS 5.54.
Does anyone have any ideas on why these won't unmount?
maybe this helps http://kb.vmware.com/kb/2011220
I tried that and it still wouldn't let me unmount. Same error message. I also have SIOC turned off on the datastore.
Working with VMWare support we were able to list out the processes that are hanging onto that datastore.
~ # vsish -e ls /storage/scsifw/devices/naa.6005076802850d082800000000000005/worlds/ |sed 's:/::' |while read i;do ps |grep $i;done
32789 idle0
33601 helper48-2
32901 OCFlush
33602 helper48-3
33609 helper48-10
33603 helper48-4
33605 helper48-6
32903 BCFlush-0
32885 helper24-0
33600 helper48-1
33612 helper48-13
33610 helper48-11
33607 helper48-8
32840 helper14-0
33614 helper48-15
33599 helper48-0
32841 helper14-1
33613 helper48-14
33606 helper48-7
33480 helper43-0
33608 helper48-9
33611 helper48-12
33604 helper48-5
32842 helper14-2
32824 helper3-1
32823 helper3-0
32904 helper26-0
~ #
The problem is these are all zombie processes and cannot be killed
~ # kill 33601
sh: can't kill pid 33601: No such process
~ #
The only way to clear them out is to reboot the host. Does anyone have any insight into why we would be getting so many zombie processes? Every time I want to remove a datastore we are running into this problem.
Hey LeslieBNS9
I know helper processes can spawn when a process is taking to long. Are these old helper processes from when you where troubleshooting the network issue inside your c7000 series or are these new?
I'm not sure they are related to the C7000 troubleshooting. But now that I think about it I wonder if they are a result of our vCenter crashing while taking VDP backups. Everytime vCenter crashes while the backups are running we see problematic checkpoint files (ctk) and many VMs that need consolidation. It seems that after I clean out these datastores the only files left are orphan ctk and snapshot files.
I can't say for certian but its deffiently possible. Does the problem happen on datastores where you don't have these ctk files and orphan snapshots?
We have them on all of our datastores at this point. But at their inception I am wondering if that's when the helper process is spawned. I think the unmounting the datastore problem may just be a symptom of the vcenter crashes that then cause all these other problems we've seen. In theory I should be able to reboot the hosts and as long as vcenter doesn't crash the unmounting problem won't occur again. Who knows. I guess only time will tell.
Yes, it is. But not the way you tried!
Services restart fixed this for me.
services.sh restart
Then was able to unmount using
esxcli storage filesystem unmount -l LUNXX
We've had the same problem several times as well. Restarting the agents as suggested in one of the replies worked in a couple cases - others required a host reboot. We're not using any special backup agents or anything like that - pretty basic ESXi and vCenter configuration with vCOPs and built-in tools. It gets quite frustrating. We're starting to turn off SIOC now - we suspect that was related to some other issues we were seeing, and I have my suspicions that it is related to this as well.
Same issue here and we ended up having to reboot 19 hosts. Support was not able to find a solution, but said it should be resolved in 5.5 U2. I cannot confirm that it is resolved, but I can confirm that the only thing that worked was a reboot.
We were never able to narrow down the root cause. It wasn't due to vcenter crashes from VDP or anything else that I can discern. We just bit the bullet and reboot our hosts every time this happens.
I used services.sh and it worked for me.
There's the VMware KB: Unmounting a datastore used for Host Cache on ESXi 5.5 fails with the error: Cannot unmou... for reference.
Maybe you should consider check it.
Unfortunately, it didn't work in our case. I cannot tell you how many different things we (VMware Support and I) tried, but the end result required killing VMs or rebooting hosts.
Searching the WEB and finding this helped us get our volume unmounted without a reboot of the ESX server. I thought I would share what we did, it may help someone else with similar problem.
After a hardware failure of our iSCSI SAN we were unable to unmount datastore from ESX host. However we were able to 'kill' the parent processes that were hanging onto the datastore and rescan Storage Adapters which unmounted and removed the datastore from the ESX host. Some of the processes were from machines that were attached to the storage at the time of the SAN failure but no longer existed on the server and 'kill'ing those processes was key to getting ESX to unmount the datastore.
~ # esxcli storage filesystem list
Mount Point Volume Name UUID Mounted Type Size Free
------------------------------------------------- ------------------------ ----------------------------------- ------- ------ ------------------- -------------------
/vmfs/volumes/4cf65020-cdd99510-1fd7-842b2b05c775 esxisrv_localdisk 4cf65020-cdd99510-1fd7-842b2b05c775 true VMFS-5 140660178944 139948195840
/vmfs/volumes/51646b41-f9944c94-40a8-001018846202 Secondary_Raid_10_7k_vd5 51646b41-f9944c94-40a8-001018846202 true VMFS-3 1395595935744 860713123840
/vmfs/volumes/4f7f2302-d03c7288-67d4-00101884620a Raid_5_15k_vd1 4f7f2302-d03c7288-67d4-00101884620a true VMFS-3 7021786302599163750 1248136196516808178
/vmfs/volumes/4d7fc952-1264e9f8-07ff-001018846070 Utility Storage 4d7fc952-1264e9f8-07ff-001018846070 true VMFS-5 999385202688 676720541696
~ # esxcli storage filesystem unmount -p /vmfs/volumes/4f7f2302-d03c7288-67d4-00101884620a
Volume '/vmfs/volumes/4f7f2302-d03c7288-67d4-00101884620a' cannot be unmounted. Reason: Busy
~ # esxcfg-scsidevs –m
naa.6842b2b01c2d4300146d3b1e08686e34:3 /vmfs/devices/disks/naa.6842b2b01c2d4300146d3b1e08686e34:3 4cf65020-cdd99510-1fd7-842b2b05c775 0 esxisrv_localdisk
naa.690b11c000377c4e000001ff51649fa5:1 /vmfs/devices/disks/naa.690b11c000377c4e000001ff51649fa5:1 51646b41-f9944c94-40a8-001018846202 0 Secondary_Raid_10_7k_vd5
naa.6842b2b00066caec000084bb4f7e9e4e:1 :1 4f7f2302-d03c7288-67d4-00101884620a 0 Raid_5_15k_vd1
naa.6842b2b00066caec00000da24d7f5fac:1 /vmfs/devices/disks/naa.6842b2b00066caec00000da24d7f5fac:1 4d7fc952-1264e9f8-07ff-001018846070 0 Utility Storage
~ # vsish -e ls /storage/scsifw/devices/naa.6842b2b00066caec000084bb4f7e9e4e/worlds/ |sed 's:/::' |while read i;do ps |grep $i;done
8216 idle0
8881 helper41-4
8291 OCFlush
8878 helper41-1
8891 helper41-14
11326 11326 vmx /bin/vmx
11333 11326 vmx-vthread-7:termsrv09 /bin/vmx
12256 11326 vmx-svga:termsrv09 /bin/vmx
12258 11326 vmx-vcpu-1:termsrv09 /bin/vmx
12259 11326 vmx-vcpu-2:termsrv09 /bin/vmx
12260 11326 vmx-vcpu-3:termsrv09 /bin/vmx
8293 BCFlush
11333 11326 vmx-vthread-7:termsrv09 /bin/vmx
11327 vmm0:termsrv09
11329 vmast.11327
12257 11326 vmx-vcpu-0:termsrv09 /bin/vmx
12258 11326 vmx-vcpu-1:termsrv09 /bin/vmx
8267 helper19-0
12259 11326 vmx-vcpu-2:termsrv09 /bin/vmx
8883 helper41-6
8882 helper41-5
8879 helper41-2
8890 helper41-13
3018890 3018867 hostd-worker hostd
8887 helper41-10
8886 helper41-9
8877 helper41-0
1938877 1946811 vmx-vcpu-0:srv-app /bin/vmx
8892 helper41-15
3018892 3018892 nssquery /usr/libexec/hostd/nssquery
8880 helper41-3
8884 helper41-7
8889 helper41-12
3018889 3018867 hostd-worker hostd
8244 helper13-2
3018244 3018240 sfcb-vmware_aux /sbin/sfcbd
8888 helper41-11
3018888 3018867 hostd-poll hostd
8242 helper13-0
3018242 3018240 sfcb-vmware_aux /sbin/sfcbd
12260 11326 vmx-vcpu-3:termsrv09 /bin/vmx
8228 helper3-1
82287 82064 vmx-vthread-6:kvchecho /bin/vmx
82288 82064 vmx-mks:srvecho /bin/vmx
82289 82064 vmx-svga:srvecho /bin/vmx
8227 helper3-0
8229 helper3-2
82290 82064 vmx-vcpu-0:srvecho /bin/vmx
82291 82064 vmx-vcpu-1:srvhecho /bin/vmx
3018229 3018229 openwsmand /sbin/openwsmand
3018230 3018229 openwsmand /sbin/openwsmand
3018231 3018229 openwsmand /sbin/openwsmand
8863 helper39-0
8243 helper13-1
12255 11326 vmx-mks:termsrv09 /bin/vmx
~ # kill 11326
~ # kill 11327
~ # vsish -e ls /storage/scsifw/devices/naa.6842b2b00066caec000084bb4f7e9e4e/worlds/ |sed 's:/::' |while read i;do ps |grep $i;done
8216 idle0
8881 helper41-4
8291 OCFlush
8878 helper41-1
8885 helper41-8
8891 helper41-14
8293 BCFlush
8267 helper19-0
8883 helper41-6
8882 helper41-5
8879 helper41-2
8890 helper41-13
8887 helper41-10
8886 helper41-9
8877 helper41-0
1938877 1946811 vmx-vcpu-0:srv-app /bin/vmx
8892 helper41-15
8880 helper41-3
8884 helper41-7
8889 helper41-12
8244 helper13-2
8888 helper41-11
8242 helper13-0
8228 helper3-1
8227 helper3-0
8229 helper3-2
8863 helper39-0
8243 helper13-1
~ # esxcli storage core adapter rescan –all
~ # esxcli storage filesystem list
Mount Point Volume Name UUID Mounted Type Size Free
------------------------------------------------- ------------------------ ----------------------------------- ------- ------ ------------------- -------------------
/vmfs/volumes/4cf65020-cdd99510-1fd7-842b2b05c775 esxisrv_localdisk 4cf65020-cdd99510-1fd7-842b2b05c775 true VMFS-5 140660178944 139948195840
/vmfs/volumes/51646b41-f9944c94-40a8-001018846202 Secondary_Raid_10_7k_vd5 51646b41-f9944c94-40a8-001018846202 true VMFS-3 1395595935744 860713123840
/vmfs/volumes/4d7fc952-1264e9f8-07ff-001018846070 Utility Storage 4d7fc952-1264e9f8-07ff-001018846070 true VMFS-5 999385202688 676720541696
Thanks John, this was very helpful as we had the same error trying to unmount an ISCSI datastore.
I am pretty sure you just needed to edit your scratch config as it was using that datastore:
Creating a persistent scratch location for ESXi 4.x/5.x/6.x (1033696) | VMware KB
just set it to /tmp
or create a folder in tmp named scratch and set it to /tmp/scratch
Reboot and you are good to go.