VMware Cloud Community
tektotket
Enthusiast
Enthusiast
Jump to solution

windows iscsi initiator volumes inaccessible after vmotion attempt

Vcenter  6.0.0 build 4541948

ESXi 5.5.0 build-356872 hosts in Vcenter cluster

Windows Server 2008 R2 sp1 datacenter vm

A number of vm's were successfully host vmotion'd between hosts a few days ago. One vm failed to complete the vmotion with an error The VM failed to resume on the destination during early power on: vMotioning virtual machines fails with Error: Failed to attach filter 'yyy' to scsi (2125133) | VMwa...

This vm that failed was the only one with direct iscsi volumes to our SAN through windows iscsi initiator. After the failure it was left as is, it is appearing on the target host even though the vmotion appeared to fail.  The ESXi vmdk volumes for this vm are still accessible from windows but 2 direct iscsi volumes connected through windows iscsi initiator are no longer accessible or responding.  The windows iscsi initiator service has been restarted + vm has been rebooted with no change, any tasks that requires access to the direct iscsi volumes stalls.   Both the vmdk's and the direct iscsi windows volumes are on the same Nimble SAN. The iscsi target ip can still be pinged from windows and the windows iscsi initiator properties still show the sessions connected but ends up not responding trying to pull up device or volume properties associated with the iscsi sessions.  Any suggestions would be appreciated.

Windows events -

Log Name:System
Source:iScsiPrt
Date:  1/23/2017 8:28:10 AM
Event ID:7

Task Category: None

Level: Error
Keywords:Classic
User:  N/A
Computer:servername.mydomain.com

Description:

The initiator could not send an iSCSI PDU. Error status is given in the dump data.

Log Name:System
Source:iScsiPrt
Date:  1/23/2017 9:56:53 AM
Event ID:49

Task Category: None

Level: Error
Keywords:Classic
User:  N/A
Computer:servername.mydomain.com

Description:

Target failed to respond in time to a Task Management request.

<Data>\Device\RaidPort1</Data>

Log Name:System
Source:iScsiPrt
Date:  1/23/2017 9:56:53 AM
Event ID:9

Task Category: None

Level: Error
Keywords:Classic
User:  N/A
Computer:servername.mydomain.com

Description:

Target did not respond in time for a SCSI request. The CDB is given in the dump data.

<Data>\Device\RaidPort1</Data>

Log Name:System
Source:iScsiPrt
Date:  1/23/2017 9:56:53 AM
Event ID:63

Task Category: None

Level: Error
Keywords:Classic
User:  N/A
Computer:servername.mydomain.com

Description:

Can not Reset the Target or LUN. Will attempt session recovery.

<Data>\Device\RaidPort1</Data>
Reply
0 Kudos
1 Solution

Accepted Solutions
vcallaway
Enthusiast
Enthusiast
Jump to solution

Although this doesn't make much sense, since the IQN's on the initiators remain the same since you're connecting inside the guest, but is there some type of ACL restriction on the LUN's themselves? If you can ping from the VM on the destination host, I'd assume that your networking is good. But would warrant a question if your underlying physical networking is the same as the source vMotion host. MTU? VLAN's? Also does the LUN have shared access enabled?

View solution in original post

Reply
0 Kudos
2 Replies
vcallaway
Enthusiast
Enthusiast
Jump to solution

Although this doesn't make much sense, since the IQN's on the initiators remain the same since you're connecting inside the guest, but is there some type of ACL restriction on the LUN's themselves? If you can ping from the VM on the destination host, I'd assume that your networking is good. But would warrant a question if your underlying physical networking is the same as the source vMotion host. MTU? VLAN's? Also does the LUN have shared access enabled?

Reply
0 Kudos
tektotket
Enthusiast
Enthusiast
Jump to solution

Thanks for the followup.  There is ACL via iqn on the SAN volume, this guest is the only system connected to the volume as verified from the SAN.  Numbering of the 10Gb nics dedicated to the direct iscsi on each host differs - source nic was vswitch5 / vmnic9, destination was vswitch4 / vmnic5, port group naming on both is the same.  I see now that the destination host direct iscsi vswitch mtu had been left at 1500.  Our san, physical switch ports, source host vswitch, and guest vm nic for the iscsi initiator are all set to jumbo 9000.  Won't be able to test this again in the near future but that mtu mismatch has to be the culprit.  Ended up going with a quick fix of shutting down the vm, manually removing it from inventory on the destination host and re-adding it back on the source host where the storage became accessible again. Was also looking at the problem from this MS angle but it's looking not as likely at this point: https://support.microsoft.com/en-us/help/967476/iscsi-favorite-targets-may-need-to-be-re-created-if-...

Reply
0 Kudos