VMware Cloud Community
kehall
Contributor
Contributor

ESX3.5 - NAS Datastore inactive on reboot

Hi,

Upgraded one of our hosts from 3.0.2 to 3.5 and patched up with the (currently) 4 critical patches.

The host in VI client (VC or directly to host) shows NFS datastores as inactive.

Vmotion is unable to move VMs to this host.

Double clicking on the NFS datastore opens up and shows the VM data fine.

Refresh in VI (either on the network page or right click on the datastore) doesn't fix - it may update the free space but remains "inactive" and greyed out.

On the service console, vdf shows the mounts just fine, and I can enter into the /vmfs/volumes/[datastore] folder and read/write just fine.

The only way to regain connectivity is to delete the NFS datastore mounts and re-add (a time consuming process). Then it works again until the next reboot and they're inactive once more!

This is connected to an NFS export on a NetApp filer - and I've not seen this issue with the other 3.0.x hosts. I've searched logs in /var/log/ for nfs, nas but find nothing of much interest (but not really sure where to look).

Any ideas?

Regards,

Keith.

Reply
0 Kudos
25 Replies
kehall
Contributor
Contributor

oh, and "esxcfg-nas -r" doesn't help either 😕

Reply
0 Kudos
Stewarts
Contributor
Contributor

I have the same problem as you. I'm on ESX 3.5 Update 1 Build 98103. The other 4 hosts that were patched connect to the NAS just fine but one host will not. The only way I have seen around this is to reboot the NAS (fortuately only holding my ISOs and is a windows 2003 r2 server).

Same situation though, I can browse it through the console and through the VIC but virtualcenter says its "inactive" and cant point any VM's CDROM at any ISOs.

Reply
0 Kudos
jnickel
Contributor
Contributor

I too am having the same problem. ESX 3.5 Build 98103

Had no problem adding the NFS mount initially...then after 9 reboot, the NFS mount is stuck as (Inactive).

vmkping to the nfs server works fine. If I delete the NFS mount and try to add it again, that doesn't work either. I have other ESX servers that are still using the NFS, so I am scared to reboot them now.

The log on the NFS server shows a successful connection, but the log on the ESX shows a error 13 - timeout.

Any ideas what to look for next? I tried esxcfg-firewall -e nfsClient - just in case.

Jim

Reply
0 Kudos
israndyc
Contributor
Contributor

I am also having the same issue, was there ever a response to the issue with a solution

Any help would be appreciated

Reply
0 Kudos
tvleavitt
Contributor
Contributor

I'm having this same problem with my SnapServer 520, had it for a while; ESXi 3.5 installed. Every time I reboot the SnapServer, the NFS datastore goes "inactive" in ESX and can't be made active, and I have to delete the VMs from inventory, delete the datastore, re-add the VMs to inventory, then restart it. This is a major PITA, and causes me to lose the historical performance data. It is also forcing me to use local storage for my VMs, which I'm very frustrated by.

Reply
0 Kudos
elMojo
Contributor
Contributor

I got it!!!!!! It's a race condition. My suspicion is that maybe the networking isn't stabilized by the time the management daemon starts? We run ESXi 3.5, Update 3, and I don't recall if ESX was affected or not; it's been a while since we switched.

Just like so many others, things would be great until a reboot, then the NFS datastores would appear as Inactive. We could double-click on them and browse them, and esxcfg-nas -r wouldn't help.

Then what? Restarting hostd would do it:

/etc/init.d/hostd restart

But how to do that at startup, when it's already supposed to be starting up? A sleep statement in the hostd init script.

vi /etc/init.d/hostd

in the start section, right before the "setsid watchdog.sh ..." line (around line 16), put a "sleep 60s". I didn't have much time to test, but I did find that 30 seconds wasn't adequate with my setup, yours may vary, play around. It already takes long enough for these boxes to reboot, 60 seconds really doesn't really add that much

By the way, we're using a NetApp FAS270 connected with gigabit ethernet...

Good luck fellow ESXers!

Reply
0 Kudos
dnetz
Hot Shot
Hot Shot

I had this problem on ESXi 3.5.0 (build 110271) running standalone with two NFS mounts to a NetApp FAS2050 until I switched from DHCP to static ip address for vmkernel, it seems ESX tried to connect to the storage before it had a dhcp lease and never tried again.

My NFS datastores would come up as "inactive" until i refreshed them in the VI Client (Configuration -> Storage) but obviously no VM's would autostart because of this.

Hope it helps!

Reply
0 Kudos
elMojo
Contributor
Contributor

Well, that DOES make sense, as far as a race condition goes... But

we're just too big to keep track of things like that. There's like

thirty or forty subnets, and at least 700 addressable hosts.

We use a homebrew DHCP/DNS management system that generates the

necessary config files for named and dhcpd. Everybody gets their

address from DHCP, even though it's a fixed lease...

So, in my opinion, VMware should remove the DHCP client option from

their product if it doesn't work right...

Does ANYONE have dhcp working right with their ESX[i] setups? Setups

with shared NAS storage? Our dhcpd responds pretty quickly to requests

every time, much like a simpler router with a single subnet would, so

I'm curious to see if this condition is limited to people with close to

this specific setup, a NetApp FAS product with ESXi...

Thank you for your suggestion Smiley Happy As long as I put the fixed lease into

our management software, I can set the blades to the same addresses

staticly. That might be easier/more intuitive than editing the hostd

file...

Mojo

Reply
0 Kudos
Toonix
Contributor
Contributor

Anyone solved this problem? I got 3.5.0 130756 and the only thing which works is to remove the nas and recreate the datastore which is pretty uncool since i got more then one server and i have todo this on them all everytime someone reboots the nas.
vdf
Could not open /vmfs/volumes/81f1bfb9-1630db31. Error: No such file or directory /vmfs/volumes/94c1a955-879ef166
root]# esxcfg-nas -l DiskSation CC is /volume1/VMWare from 89.1.200.51 mounted DiskStation BB is /volume1/VMWare from 89.1.200.50 mounted
Reply
0 Kudos
elMojo
Contributor
Contributor

The problem I was having was when someone rebooted the host, not the

NAS. Using ESXi, I was able to create a workaround (should be detailed

above) but it involves entering the 'unsupported' shell and modifying a

file. But since it's your NAS rebooting and not your host, and I'm not

sure if you're using ESX or ESXi, I'm not sure what to recommend...

Reply
0 Kudos
sicoffey
Contributor
Contributor

Hi Toonix

I am seeing your problem too (NFS datastore inactive after NAS reboot - not quite the same as original poster I know). Did you find a resolution?

cheers

Reply
0 Kudos
sicoffey
Contributor
Contributor

sorry, duplicate

Reply
0 Kudos
elMojo
Contributor
Contributor

Kind of, but only if you're using ESXi. Edit /etc/init.d/hostd and add

"sleep 60s" right before the "setsid watchdog.sh ...." line... but not

in ESX, sorry.

Reply
0 Kudos
sicoffey
Contributor
Contributor

Hi elMojo

Thanks for your quick response.. as it happens I am in fact on ESXi not ESX. On the downside, I have the problem on NAS reboot rather than ESXi host reboot... I see your fix will address the latter by delaying boot. I guess in fact that the root course of these problems is the same - ESX(i) doesn't retry connection to a NFS datastore?

Reply
0 Kudos
elMojo
Contributor
Contributor

Oops, sorry, you were asking Toonix :smileygrin:

Reply
0 Kudos
elMojo
Contributor
Contributor

Well, I'm not sure my fix will help you -- with my case, ESXi is

attempting to connect to the NAS before ESXi's networking is up! funny.

But in your case, ESXi's networking was up and remained up, maybe it

just didn't feel like trying again...

Reply
0 Kudos
javelin
Contributor
Contributor

I'm running ESX 3i 3.5.0 and have ran into the same issue where NAS is inactive after rebooting

my host. Refresh does not work and I didn't bother going into the restricted shell.

My datastore is actualy an NFS mount from another Linux system. To resolve my problem,

I've simply restarted the nfs daemon on the Linux system and my datastores came up

automatically.

Not a very complicated setup but if someone is playing with 3i, like I am, and is using an

NFS for datastore and come across this problem, this should be a simple fix.

jav

Reply
0 Kudos
elMojo
Contributor
Contributor

It's just not very automatic... and for some reason restarting the NFS

server didn't work for me! Not sure why.

We have three NFS datastores and it is imperative that it all happens

automagically Smiley Happy Thanks for the info though.

On Thu, 30 Apr 2009 19:42:15 -0700, javelin

Reply
0 Kudos
RobAtHomeNet
Enthusiast
Enthusiast

Has this issue been resolved by ESXi 4.1? I'm having the same issues in ESXi 4.0.

Today, I had to take down my NFS share (Synology RS810RP+ running DSM 2.3-1161) and move it to another rack. Doing so meant it was down for a few minutes. When I brought it back up, all 4 of my ESXi hosts could not start any of the guests that used that NFS share. We're only testing with this thing and the guests are NOT mission-critical so I didn't bother rebooting the hosts. I did tried doing a /etc/init.d/hostd restart but that didn't help. I'll try rebooting the hosts in the morning but this needs to be fixed. If it's fixed in the 4.1 update, then I'll be golden!

The release notes for the 4.1 update state that there have been tweaks done to "improve" NFS capabilitie.

2010-09-01

1549 EDT

Reply
0 Kudos