VMware Cloud Community
marauli
Enthusiast
Enthusiast
Jump to solution

Troubleshooting iSCSI

Our Meraki network switches had their firmware updated, and all of a sudden 3 out of 4 ESXis lost connectivity to an iSCSI target.

(The target is a Dell r730 server running Ubuntu 22.04.3 LTS with the sole purpose of being iSCSI storage for a VMware cluster. The ESXi hosts are 7.03.)

When scanning the iSCSI storage adapter on an ESXi host that can no longer mount the datastore, the host appears to recognize the LUN: it adds an item to "static targets" - based I am assuming on scanning the dynamic ones:

marauli_0-1691435165315.png

(The device / target / LUN is highlighted in green.)

Yet I don't see it presented as a "device" though (which I could mount as a datastore, or where I could create one) under "devices":

marauli_1-1691435378089.png

For comparison, here is one of the ESXis that can see the device:

marauli_2-1691435556940.png

It shows as "degraded" (probably because of lack of NIC redundancy - where would I look to confirm?) - yet it does show up, and I can seemingly create a datastore on that target.

I also spun up an older standalone ESXi 6.7 and it can also see the device. My Windows desktop - ditto.

How would I troubleshoot this issue on the ESXi hosts that can't seem to recognize the iSCSI target as a valid device?

Thanks!

P.S. (Edit) '/var/log/vobd.log' has a number of these pointing to a network configuration issue:

 

2023-08-04T23:31:20.246Z: [iscsiCorrelator] 624003451802us: [vob.iscsi.target.connect.error] vmhba64 @ vmk1 failed to login to iqn.1988-11.com.dell:01.array.bc305bf24e32 because of a network connection failure.
2023-08-04T23:31:20.246Z: [iscsiCorrelator] 624000365087us: [esx.problem.storage.iscsi.target.connect.error] Login to iSCSI target iqn.1988-11.com.dell:01.array.bc305bf24e32 on vmhba64 @ vmk1 failed. The iSCSI initiator could not establish a network connection to the target.
2023-08-07T16:25:21.187Z: [iscsiCorrelator] 857645568069us: [vob.iscsi.discovery.connect.error] discovery failure on vmhba64 to r730b-00.datastores.infra.<masked>.com because of a network connection failure.
2023-08-07T16:25:21.187Z: [iscsiCorrelator] 857641306178us: [esx.problem.storage.iscsi.discovery.connect.error] iSCSI discovery to r730b-00.datastores.infra.<masked>.com on vmhba64 failed. The iSCSI Initiator could not establish a network connection to the discovery address.

 

What command could I run on the affected ESXi hosts to confirm lack of necessary connectivity to the target?

Labels (2)
Reply
0 Kudos
21 Replies
marauli
Enthusiast
Enthusiast
Jump to solution

Resolved, all four ESXis can now see the iSCSI target in question.

Misconfigured iSCSI Port Binding configuration was likely one of the culprits, as @a_p_ suggested.

The other one was likely some stale configuration where rebooting, rescanning, detaching stale devices, rebooting, rescanning several times - seems to have helped.

Thank you all very very much for the help! (Been at it for quite some time, feels good to get some progress.)

Reply
0 Kudos
marauli
Enthusiast
Enthusiast
Jump to solution

The culprit wasn't network port binding - it seems to have something to do with ESXis holding on to stale paths and items in Dynamic and Static Discovery.

Here are some tests I did:

  1. Enable (deliberately misconfigured) Network Port Binding (on unrelated vmks not connected to the iSCSI targets in question, and on vmks not in the same local subnet - i.e. a deliberate misconfiguration to see if this would cause issues with an iSCSI target and datastore attached to a different vmk)
    • The datastore on the iSCSI target in question is still accessible and browsable
    • Rescanned the adapter - no change
    • Rebooted the ESXi host - no change
    • Disconnected the datastore in question by removing items in static and dynamic discovery and rescanning the adapter
    • The datastore in question is no longer accessible
    • Added needed items to the Dynamic Discovery, rescanned the adapter
    • The device and the datastore showed up
    • (I think this demonstrates that misconfigured Network Port Binding had no effect on this issue.)
  2. CHAP misconfiguration resulting in losing access to the datastore, restored only by removing and re-adding the datastore - by removing relevant items in Dynamic and Static Discovery, rescanning, and then re-adding them.
    • Configured CHAP (incoming) authentication on the target (no changes on the ESXi hosts - not yet)
    • The datastore became inaccessible :no_entry:
    • Tried to configure one of the ESXi hosts to use CHAP authentication. No luck - likely something I was doing wrong. Or, possibly, it's the same "stale info" issue and I needed to remove and re-add the target before the ESXis could access the datastore with CHAP configured.
    • Cleared CHAP configs on both sides, restarted iSCSI service on the target, rescanned the adapter - no change.
    • Rebooted the target, rebooted the ESXi host, verified iSCSI configuration, rescanned several times - no change. Datastore still inaccessible. :no_entry:
    • Removed the relevant items in the Dynamic and Static Discovery, rescanned the adapter - thus effectively removing the datastore. Re-added them on one (not all) of the ESXi hosts. The datastore magically showed up. :white_heavy_check_mark: Surprisingly, it also showed up and became accessible on all other ESXi hosts - without any changes there - without even rescanning or refreshing anything. (Weird, isn't it?)
      • I think this was the original issue I experienced. The datastore was available and could be connected all along, I just needed to not just rescan the adapter, but perform the above steps of effectively removing the iSCSI target, re-scanning, re-adding it, and hoping the device and the datastore would show up.

To sum up, if an iSCSI datastore disappeared on an ESXi host and isn't showing up no matter how many times you reboot or rescan, try this:

  1. Remove the target and the datastore by:
    1. removing relevant items in Static and Dynamic Discovery tabs in "iSCSI Software Adapter" configuration on an ESXi host
    2. re-scanning the adapter and confirm the relevant device(s) and the datastore are gone
  2. Re-add the target by re-adding the relevant items back to Static and Dynamic Discovery, and re-scanning the adapter.

This worked for me and I hope this works for someone else in a similar situation.

Reply
0 Kudos