Solved: Re: Change the value of DefaultTimeToWait paramete...

Madrilleno21 · ‎11-18-2009

I need the change the value of the parameter DefaultTimeToWait from its current value of 2.

To see the parameter, in the VI client on the ESX, click on configuration tab -> Storage adapters then click on the properties of the ISCSI Adapter (usually vmhba33/34) then click on Advanced. The value is greyed out and cannot be changed.

Does anyone know where it's set?

Madrilleno

Andy_Banta · ‎11-19-2009

Madrilleno,

I don't have a link. Your support contact can probably supply some information.

The description you got is pretty accurate. In some unusual cases, a frame is sent from ESX with the wrong source MAC address. Since the

response to this frame never gets to where it's supposed to go, it's as if the response was dropped. Once a Nop gets put on that session, the Nop times out and the connection gets reset. (That's the way the Nop should work. The problem is the response is getting misdirected and lost.)

Thanks,

Andy

View solution in original post

Andy_Banta · ‎11-18-2009

Madrilleno,

There's no easy way to change the Time2Wait in the ESX4 iSCSI initiator.

What application are you using that's causing any problem with it?

Andy

Madrilleno21 · ‎11-18-2009

Andy,

It's not an ap I'm struggling with but dropped connections to my SAN. VMW support have asked me to alter the value; I think they're barking up the wrong tree, but I need to change it to prove them right or wrong.

Madrilleno

Andy_Banta · ‎11-18-2009

Madrilleno,

There's no way to change it on the ESX host in anything resembling a supported way. If you set it on the target, the ESX iSCSI initiator will honor

the time2wait negotiated at target login time.

What are the symptoms you're seeing? What is your configuration? Are you seeing Nop timeouts recorded in /var/log/vmkiscsid.log?

Thanks,

Andy

Madrilleno21 · ‎11-19-2009

Andy,

Config:

3 x Dell R710 running ESX4. Dell Equallogic PS4000E SAN connected via iSCSI. 2 x HP Procurve 3500yl-24G switches inbetween.

MPIO set up, pathing static through two VMKernel ports linked to two Pnics.

I have four errors reported on the VCenter this morning from two different hosts, all connection errors. They were at 00.34, 03.14, 04.25 and 05.14. All of these were on to a volume called SAN-ISO_STORE which I use to store .iso files, etc. At the time, none of my VMs had anything mounted from this store, so the problem is DEFINITELY not VM related.

As to the log, yes, there are Nop timeouts show at the relevant times. The file is attached, but the relevant entries are here as well:-

2009-11-19-00:34:01: iscsid: Nop-out timedout after 10 seconds on connection 4:0 state (3). Dropping session.

2009-11-19-00:34:04: iscsid: connection4:0 is operational after recovery (2 attempts)

2009-11-19-03:14:09: iscsid: Nop-out timedout after 10 seconds on connection 4:0 state (3). Dropping session.

2009-11-19-03:14:12: iscsid: connection4:0 is operational after recovery (2 attempts)

2009-11-19-05:14:08: iscsid: Nop-out timedout after 10 seconds on connection 4:0 state (3). Dropping session.

2009-11-19-05:14:12: iscsid: connection4:0 is operational after recovery (2 attempts)

Andy_Banta · ‎11-19-2009

Madrilleno,

Changing your time2wait won't get rid of this.

Do you have any patches installed? If not, you should install the current patches available that take care of some connection issues. If so, since you're already talking to support, tell them it could be another case of PR 484220.

This is a bug VMware has seen with multiple vmkNics connected to storage.

Thanks,

Andy

Madrilleno21 · ‎11-19-2009

My boss has suggested I look at KB1012232, but the support team I'm talking to seems to think from the logs that it is more likely a known bug. Apparently there are some packets being sent to the wrong MAC address and there is an experimental fix to patch it. I'm just going through the install now, but I won't know what the outcome is for a few days. When I know for sure, I'll repost to this thread.

Madrilleno

Andy_Banta · ‎11-19-2009

Madrilleno,

That's the problem being tracked by 484220. There's actually a second experimental patch at this point. Make sure you ask for that.

Hope this gets cleared up.

Thanks,

Andy

Madrilleno21 · ‎11-19-2009

Andy,

Do you have a link to info on PR484220?

Madrilleno

Andy_Banta · ‎11-19-2009

Madrilleno,

I don't have a link. Your support contact can probably supply some information.

The description you got is pretty accurate. In some unusual cases, a frame is sent from ESX with the wrong source MAC address. Since the

response to this frame never gets to where it's supposed to go, it's as if the response was dropped. Once a Nop gets put on that session, the Nop times out and the connection gets reset. (That's the way the Nop should work. The problem is the response is getting misdirected and lost.)

Thanks,

Andy

Madrilleno21 · ‎11-20-2009

Andy,

I have marked your answer as correct, thanks for your help. The advice I have been given is to set up the host iSCSI connection with two VMKernel ports per physical nic. The details are in the attached Dell document. This will give me path redundancy and consistent connection until a patch is released, although I will have a lot of notifications in the event logs.

If and when the patch is released, I will post back here to complete the history in case any other users come accross this post.

Madrilleno

actixsupport · ‎12-09-2009

Hi

Has anyone got any patches/response/work around with this?

Recently setup 2 x ESX4 hosts using the with 3 x Equallogic boxes as per the document attached to the previous post.

Updated to U1 and still seeing these drops and messages and while migrating VMs via storage vmotion getting bad hanging/timeouts on ESX4 hosts. ESX 3.5 hosts not a problem. I've had to bring it down to 1 SvMotion at a time now where as with ESX3.5 hosts had 4-6 kicked off at a time in a cluster.

ESX4/vSphere are running on BL390 Blades with 10G NICs and 3 connections per NIC as per doc.

Cheers

Ray

vmkiscsid.log:2009-12-08-05:13:56: iscsid: Nop-out timedout after 10 seconds on connection 16:0 state (3). Dropping session.

vmkiscsid.log:2009-12-08-05:14:26: iscsid: Nop-out timedout after 10 seconds on connection 46:0 state (3). Dropping session.

vmkiscsid.log:2009-12-08-05:14:27: iscsid: Nop-out timedout after 10 seconds on connection 52:0 state (3). Dropping session.

vmkiscsid.log:2009-12-08-05:14:31: iscsid: Nop-out timedout after 10 seconds on connection 58:0 state (3). Dropping session.

vmkiscsid.log:2009-12-08-05:14:32: iscsid: Nop-out timedout after 10 seconds on connection 34:0 state (3). Dropping session.

vmkiscsid.log:2009-12-08-08:44:11: iscsid: Nop-out timedout after 10 seconds on connection 116:0 state (3). Dropping session.

vmkiscsid.log:2009-12-09-10:52:28: iscsid: Nop-out timedout after 10 seconds on connection 4:0 state (3). Dropping session.

vmkiscsid.log:2009-12-09-10:52:42: iscsid: Nop-out timedout after 10 seconds on connection 5:0 state (3). Dropping session.

Andy_Banta · ‎12-09-2009

Has anyone got any patches/response/work around with this?

The problem has been identified and a trial fix has been put together. Once tested, a patch should be forthcoming.

Andy

s1xth · ‎12-10-2009

Andy...

Does this patch also fix the issues with the intiator dropping? For example the thread on the Equaloigc iSCSI issues here in the storage forum...

I recieve these messages:

INFO 12/10/09 5:39:28 AM psisan1 iSCSI login to target '10.10.5.12:3260, iqn.2001-05.com.equallogic:0-8a0906-7325b9d04-168000cfdd44b155-esxvol1' from initiator '10.10.5.14:62243, iqn.1998-01.com.vmware:vh3psrv3-1903c5bc' successful, using Jumbo Frame length.

INFO 12/10/09 5:39:25 AM psisan1 iSCSI session to target '10.10.5.12:3260, iqn.2001-05.com.equallogic:0-8a0906-7325b9d04-168000cfdd44b155-esxvol1' from initiator '10.10.5.14:51576, iqn.1998-01.com.vmware:vh3psrv3-1903c5bc' was closed. iSCSI initiator connection failure. Connection was closed by peer.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi

Andy_Banta · ‎12-10-2009

Does this patch also fix the issues with the intiator dropping? For example the thread on the Equaloigc iSCSI issues
here in the storage forum.

This could be one of the symptoms of the problem. Without seeing a wire trace, it's hard to say for certain. Try the patch first when it comes out.

Thanks,

Andy

s1xth · ‎12-10-2009

Andy...

I am assuming this will be a VMware patch? Will this be included in the next patch cycle? Since U1 just came out not to long ago, does that mean we will be waiting awhile for this release?

Thanks!

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi

Andy_Banta · ‎12-10-2009

I am assuming this will be a VMware patch?

Yes. I don't have any additional info on timing.

Andy

All

Change the value of DefaultTimeToWait parameter