Sporadic, brief "all paths down" status - NetGear ...

jlehtinen1 · ‎06-11-2013

Looking for some advice on something that has been driving me nuts for a few months.

I'm running ESXi 5.1 on several Dell PowerEdge R805's. I'm using two NetGear ReadyNAS 3200's for storage, they're hosting NFS across gigabit ethernet.

This setup has been working great, and generally speaking I have no problems. However, on every ESXi host, I can see in the logs that the storage sometimes goes into All Paths Down state, then exits APD state. The APD state almost always lasts for exactly 7 seconds. The timing on when this happens does not seem to coincide with anything else that I can tell - it can happen at any time of day. Sometimes it happens multiple times in one day. Other times it might be a week between errors. Sometimes the error is on ALL the ESXi hosts for the same 7 seconds, other times it is only on one ESXi host.

Error example:

It's not an issue for the most part, and performance and operation does not seem to be impacted. HOWEVER - every 3-4 months or so the ReadyNAS units go completely unresponsive. I have to manually power them off and back on again to restore access to the ESXi hosts. I'm wondering if this is related to the sporadic APD error.....

Things I've tried:

- Contact ReadyNAS for troubleshooting (spent a LOT of time with them, they really have no idea on what is causing this)

- Replacement of gigabit switch

- Upgraded firmware on ReadyNAS

- Tried both teamed and active/backup configurations on NAS NICs

Any ideas?

admin · ‎06-11-2013

Is the firmware version of nas RAIDiator-x86 4.2.20

jlehtinen1 · ‎06-12-2013

All ReadyNAS's are running RAIDiator 4.2.22

yuefeng · ‎04-09-2014

Hi ,

I happened same issue and environment are same with you . Do you solve it now ? My vsphere version is 5.1.0,HP blade 460c G7 as esxi host and FAS6240 as nfs server.

andyoakeley · ‎05-06-2014

Hi,

Did you every get this issue resolved. Having identical issues.

Andrew

ikiris · ‎07-29-2014

Has anyone resolved this issue? I'm seeing the same issue on 5.1u2 and 5.5u1(with APD patch). HP Blade 460c G6/G7/Gen8 with v3170 and v3270s as nfs servers on 8.1.2 and 8.2P3.

-Chris- http://www.twitter.com/ikiris http://blog.chrischua.net

andyoakeley · ‎07-29-2014

I have had an open case with both netgear and vmware for 2-3months now.

We have

- replaced the Netgear Chassis

- re-installed various versions of vmware

- tried various NFS settings in vmware

- dumped logs

- blah blah blah

- vmware are being more useful than netgear, who pretty much said nothing more they can do.

I am down to trying to catch a tcpdump of traffic on the storage interface, but proving hard to catch in the act.

I do have one guest that IF i move back onto the storage will cause the issue every hour, at the same time each hour. But I am reluctant to move it back due to the user impact this causes.

- when I say same time each hour it is 10mins after the particular guest restarts and then ever hour after that, give or take a few mins. Believe me I have gone through this guest with a fine tooth comb to see what might be initiating it.

- with this particular guest off the storage I only get sporadic APD.

yuefeng · ‎07-29-2014

I did happen similar issue on Apr , finally get the root cause by do analysis the packet from esxi host to nfs storage(Netapp FAS6240) . The error i got from packet as below:

Tcp zero window,it means too many ios send to storage .

The reason was that there is a new application deployed on the vmware which running 7 mysql database instances. The all path down error resolve after stop that application and developer do separate the dbs from the virtual machine finally.

All

Sporadic, brief "all paths down" status - NetGear ReadyNAS NFS