VMware Cloud Community
jasonawinters
Contributor
Contributor

vsphere client (4.1) stuck at "loading inventory" when shared storage is down

In the course of testing a new vSphere 4.1 infrastructure, we lost connectivity to the SAN due to a power issue. iSCSI shared storage for all ESX servers in the cluster was cut off, taking down vCenter as well, which is installed in a VM. When we tried to log on to the individual ESX servers using the vSphere client to begin troubleshooting, we encountered an error that stated:

"the vSphere Client could not connect to <ipaddress_withheld>. A timeout occurred while loading the inventory."

The system uses the Nexus 1000v virtual distributed switch, and 2 Nexus 5010 physical switches. Each ESX server has two physical 10GbE nics attached to the Nexus 5010, as does the SAN. Both NICs on each ESX host are setup in on the 1000v virtual switch, where the vmkernal for each ESX host is connected as well. When the connectivity to the SAN was lost, we were still able to ping all of the ESX hosts, and were able to ping and SSH into the Nexus 1000v vDS.

After bringing the SAN back online, we were able to connect into vCenter normally, as well as connect into each individual host using the vSphere client.

Our plans are to eventually connect another management interface to the built-in 1Gb ports on each ESX server for "out of band" management, in case we have issues with connections through the Nexus physical or virtual switches.

Any ideas on why I am unable to connect to the ESX servers while the shared storage is down? Again, the vSphere client connects, but gets stuck at the "loading inventory" stage, then times out. This happens on all 4 hosts. We also tested again after the power failure by disconnecting the ports on which the SAN is connected to the Nexus 5010, and experienced the same symptoms.

Thanks for any help or suggestions

Jason Winters

Jason Winters MS Systems Consultant Trace Systems VCAP-DCD / MCITP / CISSP
Reply
0 Kudos
4 Replies
jasonawinters
Contributor
Contributor

update - added another vmkernel port to one of the ESX servers built in NICs, connected to the ESX server virtual switch (not the n1000v). When the SAN was up, was able to log into that server on the new vmkernel port's IP address. Soon as we disconnected the SAN, got the same error connecting to that new IP. So, IMO, this has nothing to do with the config of the virtual distributed switch, but possibly an ESX limitation loading inventory items that may be located on shared storage. I do agree that if the SAN is totally down, the system is pretty much useless anyway, but I would like to be able to connect to the servers remotely in order to shut them down or troubleshoot anything else in the case of a total SAN outage. 

Jason Winters MS Systems Consultant Trace Systems VCAP-DCD / MCITP / CISSP
Reply
0 Kudos
bulletprooffool
Champion
Champion

The loss of the VC will not affect you ability to connect to the host.

Are your hosts booting from SAN?

If they are, then you have no access to any data, other than data actually stored at runtime - thus no ability to load inventory / mange the host etc.

One day I will virtualise myself . . .
Reply
0 Kudos
jasonawinters
Contributor
Contributor

Right - that is what I am trying to do - connect direct to the host. When the SAN is shut down I want to be able to log directly into the host. Of course, if VC is down you cant log into it... Smiley Happy  Hosts are not booting from the SAN, ESXi is installed direct on thier local disk. As I stated in my original post, when I launch the vSphere client and point it direct to the host, it does connect, but fails at the "loading inventory" phase of launch.

-J

Jason Winters MS Systems Consultant Trace Systems VCAP-DCD / MCITP / CISSP
Reply
0 Kudos
jasonawinters
Contributor
Contributor

Update: Yesterday I had an idea that maybe if we rebooted the ESX servers after the SAN had gone offline that we could possibly get into the host with the vSphere client. In the course of preparing to do this, I found that after disconnecting the iSCSI connection from one of the hosts, I was able to bring it online when the SAN was down. That, coupled with some new error messages I found, Call "HostBootDeviceSystem.QueryBootDevices" for object "bootDeviceSystem-19" on vCenter Server "" failed leads me to believe that when the vsphere client loads, it tries to query information regarding any datastores that it may be connected to. When the SAN suddenly fails and iSCSI is still connected, it seems that it cannot query this info and eventually just fails to load the client. After we tried one more test of dropping the SAN and then rebooting the hosts, we were able to get into the client without issue. But, this caused another issue. The Nexus 1000v configuration somehow became unsychronized with vCenter. This was a real pain to figure out. It was just two of the hosts that this happened to, but because of this problem, everytime you opened up vCenter, all of the hosts would be disconnected. You could reconnect them, but then within 30 seconds they would all disconnect again. I was finally able to solve that problem by going to the ESXi console of those two particular hosts, moving the network interfaces back to a virtual switch (off of the n1000v) and then going back to the n1000v configuration within vCenter and running through the "manage hosts" options. From there I was able to add the interfaces back and remap the vmkernal port to the vDS. After that was complete, I went to the console of the n1000v, did a "copy run start" to save the configuration, and then did a "reload" to reload the config into memory. Once that was complete I bounced the ESX hosts one more time and everything came back normally. Another note: on a few of the virtual machines (including the n1000v VM) I had to manually reconnect the VMnics and set them to the correct vDS port group. Our recommended course of action when the SAN goes down... don't reboot the ESX hosts. Get the SAN online first, and then reconnect to vSphere to assess any issues that you need to fix. If the SAN is completely down, nothing is being served through vSphere anyway. I still believe there is some kind of bug in the vSphere client during the startup phase that could be fixed so that it does not hang on checking the SAN connected drives.

Jason Winters MS Systems Consultant Trace Systems VCAP-DCD / MCITP / CISSP
Reply
0 Kudos