VMware Cloud Community
DAMahoney
Contributor
Contributor

lost connectivity to storage device affected datastores unknown

Hello,

I am receiving the error "lost connectivity to storage device affected datastores unknown" once a day, different times everyday. Sometimes it identifies which datastore is affected, so far it is only happening to two datastores. I have been running vSphere for about 10 months without issues and haven't changed/patched anything recently. I have 2 hosts, 2 Equallogic PS5000 sans connected through iSSCSI. Screen shots of my cirtual switch's are attached it that helps.

Thank you for your assistance

Reply
0 Kudos
14 Replies
Andy_Banta
Hot Shot
Hot Shot

I am receiving the error "lost connectivity to storage device affected datastores unknown" once a day, different times everyday. Sometimes it identifies which datastore is affected, so far it is only happening to two datastores.

Exact log information would be quite useful. Do you see any lasting effects from this, or is it just a message you see with no other problems seen?

I have 2 hosts, 2 Equallogic PS5000 sans connected through iSSCSI.

A first guess would be normal load-balancing operations done by the PS storage. The storage will disconnect from the host and then expect an immediate reconnection. It does this so connection loads can be spread effectively. But, that's just a guess ...

Andy

Reply
0 Kudos
DAMahoney
Contributor
Contributor

There are no lasting effects at all, the vm's on the datastore seem to stay up and there have been no user complaints. Which logs should I post, I just exported the logs and there are many to choose from. When I do the load balancing operation n the SAN should I expect my VM's to go down?

Reply
0 Kudos
Andy_Banta
Hot Shot
Hot Shot

When I do the load balancing operation n the SAN should I expect my VM's to go down?

Definitely not. These are just momentary changes to make sure connections to the storage are evenly maintained. You shouldn't even notice them, except possibly the report the connection went away and is getting restarted that you're seeing.

Which version and flavor of vSphere are you using? For vSphere 4 ESX, the contents of /var/log/vmkiscsid.log would be interesting. In ESXi, the iscsid info is logged in /var/log/messages. Checking the PS event logs for any corresponding entry could help clear up whether this is expected or not.

Andy

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

Your iSCSI configuration does not follow the suggested guide:

http://www.equallogic.com/resourcecenter/assetview.aspx?id=8453

Reconfigure your hosts, apply all latest 4.0 patches on vSphere and latest 4.x firmware on Equallogic.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
DAMahoney
Contributor
Contributor

Andy,

The log you requested is attached, I am researching right now how to do a load balancing operation on my Equalogic SAN.

Andre,

What in my iSCSI configuration is misconfigured? I had Dell do the initial install and haven't made any iSCSI changes, I did upgrade from 3.5 to vSphere myself though, maybe the iSCSI configuration needs to be different? i have been running vSphere for almost a year and these errors have only been around for a few weeks.

Thanks for all of your help,

Dave

Reply
0 Kudos
DAMahoney
Contributor
Contributor

I haven't been able to figure out how to do a load-balancing on my Equallogic PS5000, do you know if it is through the GUI or the CLI? I also noticed that my path selectionis set to fixed for my datastores, should I change it to "round robin" or "Most recently used"?

Thanks,

Reply
0 Kudos
Andy_Banta
Hot Shot
Hot Shot

I haven't been able to figure out how to do a load-balancing on my Equallogic PS5000, do you know if it is through the GUI or the CLI?

If you login to the PS as "root" and issue "iscsi_test alogout" it will generate an asynchronous logout on ALL sessions. This is the same event that happens in load balancing operations. This isn't intended for casual use.

I also noticed that my path selectionis set to fixed for my datastores, should I change it to "round robin" or "Most recently used"?

As Active/Active storage, fixed will be the default policy. With EqualLogic, round-robin works also. Or, running ESX 4.1, install their MEM and go to town.

Andy

Reply
0 Kudos
Andy_Banta
Hot Shot
Hot Shot

The log you requested is attached, I am researching right now how to do a load balancing operation on my Equalogic SAN.

Dave,

These log entries show the handling of the load balancing events from the PS. Do these correspond to your original "lost connectivity" messages?

2010-07-17-13:59:02: iscsid: Target requests logout within 3 seconds for connection

2010-07-17-13:59:06: iscsid: connection6:0 is operational after recovery (2 attempts)

2010-07-18-14:23:07: iscsid: Target requests logout within 3 seconds for connection

2010-07-18-14:23:11: iscsid: connection6:0 is operational after recovery (2 attempts)

2010-07-19-20:56:14: iscsid: Target requests logout within 3 seconds for connection

2010-07-19-20:56:18: iscsid: connection6:0 is operational after recovery (2 attempts)

2010-07-20-03:08:16: iscsid: Target requests logout within 3 seconds for connection

2010-07-20-03:08:20: iscsid: connection6:0 is operational after recovery (2 attempts)

These are normal but relatively rare operations.

Andy

Reply
0 Kudos
DAMahoney
Contributor
Contributor

When you say "This isn't intended for casual use." What do you mean, should I expect to see an outage when I run that command?

The lost connectivity message does correspond with a "target request logout within 3 seconds for connection" message. If they are normal but rare, do you have any idea why it would be happening so often in my environment?

Thanks,

Dave

Reply
0 Kudos
Andy_Banta
Hot Shot
Hot Shot

When you say "This isn't intended for casual use." What do you mean, should I expect to see an outage when I run that command?

Dave,

All sessions to the PS momentarily drop. If the only hosts connected to the storage are ESX and ESXi, they should all recover in 4 to 5 seconds.

VMs on ESX hosts should have no trouble, other than IO at that time taking a few seconds. Other hosts should be able to handle these events, as well. What I mean is that it's intended for test purposes, only, and you probably shouldn't use it for sport in a production environment.

The lost connectivity message does correspond with a "target request logout within 3 seconds for connection" message. If they are

normal but rare, do you have any idea why it would be happening so often in my environment?

It happened 17 times in two months on your host, based on the log. I'm guessing the PS will do this when it sees a disparity of load between several ports on the storage (you can ask EqualLogic for more definite information). In that case, the frequency would be environment-dependent. By rare, it shouldn't happen every ten minutes. Various loads probably would make it happen more often than you're seeing it happen. As I mention, it's normal, and it's intended to make better use of the network and ports that are available. In spite of creating some log clutter, it's supposed to be a good thing.

Andy

Reply
0 Kudos
DAMahoney
Contributor
Contributor

Andy,

I have been busy on other projects for the past few week. I ran the load balancing command and i am still gettign the "lost connectivity" messages in my event logs. I don't know what else to try, do you have any more suggestions?

There are still no complaints from users and I don't see a patern of whern these events happen. The person in charge of backups mentioned that he was getting some iSCSI errors when backups on my VM's were running but the events I am seeing do not occur when backup is happening.

Thanks,

Dave

Reply
0 Kudos
DAMahoney
Contributor
Contributor

If anybody has any information that could help me resolve this issues it would be greatly appreciated. The errors are still appering but the VM's aren't having any problems.

Thanks,

Dave

Reply
0 Kudos
Andy_Banta
Hot Shot
Hot Shot

Dave,

I've been away for a little while, as well.

The message is a harmless artifact of the EqualLogic system performing load balancing, in this case. It's supposed to do this. You can safely disregard this message if it comes up once in a while.

Enjoy,

Andy

Reply
0 Kudos
Andy_Banta
Hot Shot
Hot Shot

I ran the load balancing command and i am still getting the "lost connectivity" messages in my event logs.

Triggering a load balancing event should demonstrate to you that this is the cause of the log message. Logging in 4.1 is a little tidier and you shouldn't see this message any more.

Andy

Reply
0 Kudos