Hello Gurus,
Offlate i noticed an event log entry in vSphere client twice in short span of less than a day related to local storage as follows:
Lost access to volume
4f4c8bc0-4d13eab8-c8fc-5cf3fc09c3fa (vms-1)
due to connectivity issues. Recovery attempt is in
progress and outcome will be reported shortly.
info
8/18/2013 2:01:22 AM
vms-1
and immediately
Successfully restored access to volume 4f4c8bc0-
4d13eab8-c8fc-5cf3fc09c3fa (vms-1) following
connectivity issues.
info
8/18/2013 2:01:22 AM
nesxi1.x.x
The event details itself recommends "Ask VMware" link leads to VMware KB: Host Connectivity Degraded
and
this VMware KB: Host Connectivity Restored
As per the KB VMware is referring to SAN LUN, but in our case its the local storage, kindly shed some info as to why the local storage would lost its connectivity.
Note: all the local disk are on RAID-10.
thanks
Hi,
Which ESXi version are you running on?
I had a similar issue with ESXi 5.1 no update and after patching it to the latest, ESXi 5.1 Update 1, the issue has been resolved.
Hope this helps,
Steven.
ESXi Ver 5.0, Build 469512
Hello,
I am experiencing the same issue since one week and this is corresponding to the upgrade of our ESXs to 5.1.0 1157734, but I'm not sure this is is related to.
Side effect are:
- Very high disk latency peaks (up to 10s!)
- Instability
- Lost of storage paths on some ESX.
- Inconsistencies of some virtual hard disk
Restarting the ESX solves the problem, but it comes back as soon as we have more disk access (i.e. during backup)
How did you solve the problem?
Thanks a lot for your feedback and best regard
I'm running into same issues with SAN datastore (VNX5500 array). I'm running ESXi 5.0 (1311175)
did you guys ever resolved the issues?
Thanks
Exact same issue here.
It’s killing me. 5.1.0 (1612806). All SAN (EMC CX4), Qlogic Fiber HBA’s and new
Dell R720’s.
It’s getting ugly.
Has anyone resolved this issue?
Hi -
Same issue here with 5.0 and VNX 7500.
Has anyone resolved this issue?
So ... Any news on this. Had the same issue for a while going to 5.1u2 this week. Did anyone else have luck resolving this?
I don't want to rule out anything. However, I had to troubleshoot an issue like this a few months ago, and it turned out that a bad fiber cable was causing the issue. You may want to check the FC switch ports to see whether the port(s) show e.g. CRC errors.
André
hi!
Some problem here,the scenario is also similar.
Do you fix it this?! have any idea?!
My issue is due to a bug between HP blade chassis virtual connect and Nexus 5000, but during my month long troubleshooting, I suggest anyone suffers this problem to look at everything.
1. check the HBA firmware/driver, some version of Emulex LOM have bugs that exhibit this behavior
2. if you use brocade FC swtiches with HP blades, check out the FillWord Value in your swtich config
3. If you use HP Virtual Connect Flexfabric with Nexus 5000 as your FC access swtich, there is a bug with 8GB FC, upgrade your virtual connect firmware or upgrade your nexus OS.
4. Upgrade your VNX flare code to december 2013 level, there is a dramatic improvement over ATS locking offload in that version of FLARE.
5. check to see if you array frontend ports are getting QFULL messages, if so, think about throttling the queue depth on the HBA, there is an ESXi setting for this.
6. check for bad fibre cable and SFP on and between the HBAs, FC Switches and Array.
Good luck.
Has anyone had any luck fixing this problem? I have a WD iSCSI drive with the same problem. I have to constantly reboot the ESX host and it is causing all of my servers to go down.
I would check MTU size for network Attached Storage.
For the others, i would suggest you check the battery on the raid Controllers, and thereafter did a check of all cables.
I have also seen this issue sometimes when servers have been installed With standard image instead for the hw vendor customized image.
What version of esxi are you on? if you are on esxi 5.1 update it to Esxi 5.5 will solve this issue.
I'm using ESXi 5.5 Build 1746974...I could see the error.
Not yet resolved..Any idea?
I'm having this same issue on a few Cisco R210 and C240 UCS servers, all have local datastores using Megaraid controllers and running different versions of ESXi
Cisco C240 - esxi 5.0 - no issues
Cisco R210 - esxi 5.0 - disk access issue
Cisco C240 - esxi 5.1 - disk access issue
Cisco R210 - esxi 5.0 - disk access issue
Cisco R210 - esxi 5.0 - disk access issue
Cisco C220 - esxi 5.5 - no issues
First
Device
naa.600605b005df73201951a1d33bc62893
performance has deteriorated. I/O latency
increased from average value of 708
microseconds to 24612 microseconds.
warning
9/30/2015 4:46:09 AM
10.2.42.23
Lost access to volume
54383f2f-62e7730b-ec74-4c4e3544bf5e
(snap-0a1ec5ee-datastore1) due to connectivity
issues. Recovery attempt is in progress and
outcome will be reported shortly.
info
9/30/2015 7:39:33 AM
snap-0a1ec5ee-datastore1
Successfully restored access to volume 54383f2f-
62e7730b-ec74-4c4e3544bf5e
(snap-0a1ec5ee-datastore1) following
connectivity issues.
info
9/30/2015 7:39:46 AM
10.2.42.23
Are you using FI for your Rack servers or tradational method?
hello in my case i was loosing connectivity to datastore and after some seconds restored..(vmware 6) ibm x 3550 m2, 4 ssd raid 5. i was looking and reading around more than a month.. after a powe loss and a ups ran out of batteries.. the system could't boot properly (more than 1 hour) .. so i start really examine the system and i found my little button battery was bad.. actually the battery itseld was ok 3 volts but the system show an error ( a led on motherbaord) after i change it according to ibm instructions..( power off etc.. ) the system volume works fine since.. its been 1 week without any errors.
HI.
we encountered the same problem. Still troubleshooting. We've been having conf calls with the Dell Master Engineer, 2 guys from Vmware, 2 from EMC, one from Brocade.
We have:
multiple Dell m1000e chassis
- Dell m630 with qlogic qme2662 mezz cards
- Brocade 6505 Chassis Switches
Multiple Dell r730xd
- qlogic qle2662
Brocade 5100 Core FC Switch
The LUNs are on a EMC VNX 7600.
Vmware ESX 5.5 U2 and 6.0U1
We used Dell custom ISO, Vmware vanilla ISO.
On the Dell customs the qlnative driver is really new (v2.x).
We tried a lot of changes. Been fighting with this issue for 3 weeks now.
What we managed to find that seems to be working is the following.
Since we have a lot of older servers that works well, we added 4 new paths on the VNX for the new chassis and servers.
We downgraded the qlnative to the following:
qlnativefc-1.1.20.0
And we changed all at the same time. When we add one host with newer drivers, it seems to start again with the lost datastores...
I will keep writing in this post when I have something new, good or bad.
I just started seeing this today as well with ESXi 6.0 on an HP Proliant DL380 G6. The volume is on internal hard drives.
It is never able to recover. The recovery process takes up lots of CPU, rendering the VMs unresponsive. The only way to recover is to power cycle the server.
Is this indicative of a failing hard drive or controller?
Thanks.