VMware Cloud Community
JohnBons
Enthusiast
Enthusiast

Cannot find a path to device vmhba

Goodmorning!,

We have a problem with one of our host's in the cluster(3 host's)

The vm's were still online but the host wasn't (according to vcenter). I couldnt login to the console.

After a reboot it was working again.

But now i wanna know were to start troubleshooting this host.

Message was edited by: JohnBons Removed some details

0 Kudos
7 Replies
AndreTheGiant
Immortal
Immortal

Do you have a FC SAN?

Do you have some error on storage and/or on SAN switches?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
JohnBons
Enthusiast
Enthusiast

Were using netapp iscsi.

Im checking our storage department if there was any problem last weekend.

Also im questioning our network department.

0 Kudos
AndreTheGiant
Immortal
Immortal

But the error message was clear: you lost connectivity with your LUN.

So check network connection and be really sure that is ok.

And make also failover test with multipath (disable temporally a path in datastore / multipath configuration) to be sure that is working fine.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
JohnBons
Enthusiast
Enthusiast

But the weird part is that the vcserver was reporting that the host was unavailable. And the console was reporting that a LUN was unavailable.

Is that normal behavior for the vcserver?

The network department didnt see any weirdhickup's last weekend.

0 Kudos
AndreTheGiant
Immortal
Immortal

And the console was reporting that a LUN was unavailable.

This IS the big problem, and could be caused by network problem.

But the weird part is that the vcserver was reporting that the host was unavailable.

This could be relater to network problem, or ESX too slow to give acknowledge to VC (during path rescan).

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
JohnBons
Enthusiast
Enthusiast

double post....

0 Kudos
JohnBons
Enthusiast
Enthusiast

Talked to vmware about this. And they advised me a couple of things to check.

1. Try to serialize the operations of the shared LUNs, if possible, limit the number of operations on different hosts that require SCSI reservation at the same time.

2. Increase the number of LUNs and try to limit the number of ESX hosts accessing the same LUN.

3. Avoid using snapshots as this causes a lot of SCSI reservations.

4. Do not schedule backups (VCB or console based) in parallel from the same LUN.

5. Try to limit the number of virtual machines per LUN.

6. What targets are being used to access LUNs?

7. Check if you have the latest HBA firmware across all ESX hosts.

8. Is the ESX running the latest BIOS (avoid conflict with HBA drivers)?

9. Contact your SAN vendor for information on SP timeout values and performance settings and storage array firmware.

10. Turn off 3rd party agents (storage agents), rpms not certified for ESX.

11. MSCS rdms (active node holds permanent reservation).

12. Ensure correct Host Mode setting on the SAN array.

13. LUNsremoved from the system without rescanning can appear as locked.

14. When SPs fail to release the reservation, either the request did not come through (hardware, firmware, pathing problems) or 3rd party apps running on the service console did not send the release. Busy virtual machine operations are still holding the lock.

Note: Use of SATA disks is not recommended in high I/O configuration or when the above changes do not resolve the problem while SATA disks are used.

0 Kudos