VMware Cloud Community
Cy-it
Contributor
Contributor

Lost access to volumes

Dear all,

i get into esx tasks the following event :

Lost access to volume 496befed-1c79c817-6beb-001ec9b60619 (san-lun-100) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

The storage system is IBM DS 4800 (FastT 1815) fibre 4 Gb

And the host adapters QLogic Corp. ISP2422-based 4Gb Fibre Channel to PCI-X HBA

I have upgraded recently to Esx 4.1 update1 , may that could be the reason?

How i'm going to troubleshoot the event?

0 Kudos
12 Replies
idle-jam
Immortal
Immortal

how about other hosts that is on 4.1? are those able to access?

0 Kudos
Cy-it
Contributor
Contributor

There is no problem with the access just flappy connectivity...

0 Kudos
AndreTheGiant
Immortal
Immortal

Have you check HCL to see if there are specific notes for 4.1?

Which kind of multipath are you using?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
Cy-it
Contributor
Contributor

I have other clusters that are running 4.1 not though upadate 1 and they are just fine or at least they don't generate any logs ....

If i could find out what is causing the disconnection it would be easier to find the source of the issue.

0 Kudos
AndreTheGiant
Immortal
Immortal

Multipath setting is the same?

On storage side do you have any log?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
Cy-it
Contributor
Contributor

Yes for all paths is set to “Fixed (Vmware)”

On the storage side everything is clear....

0 Kudos
AndreTheGiant
Immortal
Immortal

Very strange. Could be some issue with the HBA firmware. Is the latest one?

On /var/log/vmkernel you have more information?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
opbz
Hot Shot
Hot Shot

you might want to check if fixed is the correct setting. Might be an idea to change to Round Robin,...

I have seen it cause flapping on EMC CXs without the ALUA flare.

I would suggest you do the following:

From Vmware side.

check your storage adpaters, can you see your storage?

If you can then this might be a signature kind of issue. Then go to storage add... see if you can see exisitng VMFS volumes you might have to resignature them or allow access to them (depending on what you want)

If you can not then check your switches and then your storage. If all looks ok then there then do a rescan from the top right hand side of the Storage adapters page

hope this helps

0 Kudos
Cy-it
Contributor
Contributor

Yes actually there is :

Apr 18 13:04:28 tesx04 vmkernel: 1:18:36:03.097 cpu1:4213)NMP: nmp_CompleteCommandForPath: Command 0x2a (0x41027f360c40) to NMP device "naa.600a0b8000471b8a00000544480fe528" failed on physical path "vmhba3:C0:T0:L0" H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Apr 18 13:04:28 tesx04 vmkernel: 1:18:36:03.097 cpu1:4213)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device "naa.600a0b8000471b8a00000544480fe528" state in doubt; requested fast path state update...

Apr 18 13:04:28 tesx04 vmkernel: 1:18:36:03.097 cpu1:4213)ScsiDeviceIO: 1672: Command 0x2a to device "naa.600a0b8000471b8a00000544480fe528" failed H:0x8 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Apr 18 13:04:28 tesx04 vmkernel: 1:18:36:03.097 cpu1:4213)<6>qla2xxx 0000:08:01.1: scsi(6:0:1): Abort command issued -- 1 d7ca91 2002.

0 Kudos
AndreTheGiant
Immortal
Immortal

If you have an active SnS I suggest to open a support request to VMware.

There is a long thread with similar errors, but also on different environments: http://communities.vmware.com/message/1321985

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
krowczynski
Virtuoso
Virtuoso

HI,

having the same problem with a lots of events in vcenter, that connection was lost to storage.

I am running an EMC AX4-5, wit DataCore SANSYMPHONY-V, we are using Roud Robin a Path Selcetion.

MCP, VCP3 , VCP4
0 Kudos
NicolasGr
Contributor
Contributor

Hi krowczynski,

I use also 2 DataCore SANSYMPHONY-V with Roud Robin and Alua. We Use ESX i 4.1 Update1.

We have the same problem for several months. We can not identify where is the problem.

Have you found a solution?

Thank you in advance.

Regards,

Nicolas

0 Kudos