thickclouds
Enthusiast
Enthusiast

Stumped

So... I have a challenge and am curious what you all think.

Our operations team has a prod host that is:

  • Disconnected in vCenter

  • Operational as far as HA thinks

  • Has running VM's with no issue

  • Has no ssh or iLo console access

  • Responds to pings

  • Has no errors on the switch for either SC port

Where does one go from here. If HA is communicating, can it send instructions? Any thoughts?

Charlie Gautreaux vExpert http://www.thickclouds.com
0 Kudos
12 Replies
Troy_Clavell
Immortal
Immortal

my guess is hostd has crashed, which in most cases will require a reboot to fix. Given that you have no SSH access or any kind of remote access, you will have to be in front of the console itself. Restarting hostd may fix the issue if it isn't completely hosed


service mgmt-vmware restart

Even using PowerCLI won't work, because it won't be able to restart any of the management agents.

http://communities.vmware.com/thread/236538

Have you tried a right click "connect"?

0 Kudos
aCrazyPenguin
Enthusiast
Enthusiast

Hi there

Have you tried:

  • Are you able to manage it via your VI Client directly?

  • Have you tried restarting the management agents (at console type service mgmt-vmware restart)

  • Have you tried restarting the VC agent (at console type /etc/rc.d/init.d/vmware-vpxa restart )

Regards

-


a CraZy PeNguIn

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

------------------------- Andy Wood - VCP3 & 4 . MCITP:EA . MCSE:S . CCA . CCNA . Sec+ http://www.acrazypenguin.com If you find this answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
thickclouds
Enthusiast
Enthusiast

That's the thing. Cannot ssh, therefore issue remote comands. Cannot get on the "true" console either. It's disconnected in vCenter so no good there either.

I keep going back to the fact HA is working fine since it shows the agent running in the cluster (viewing another cluster nodes vpx logs...) Can HA somehow restart mgmt svcs?

Charlie Gautreaux vExpert http://www.thickclouds.com
0 Kudos
weinstein5
Immortal
Immortal

WHat about the the physical console of the ESX host - are you able to access that either directly or though ILO?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
Jasemccarty
Immortal
Immortal

It sounds like the local file system has gone read only. This has happened to me a couple times on some older boxes.

The only way to recover it, is reboot. I'd suggest remoting into the guests, and shut them down cleanly.

Once they are down, reboot the host.

There isn't really much else I could do to remedy the situation.

Good luck.

Jase McCarty

http://www.jasemccarty.com

Co-Author: VMware ESX Essentials in the Virtual Data Center (ISBN:1420070274) Auerbach

Co-Author: VMware vSphere 4 Administration Instant Reference (ISBN:0470520728) Sybex

Please consider awarding points if this post was helpful or correct

Jase McCarty - Field SA at PureStorage - @jasemccarty
0 Kudos
thickclouds
Enthusiast
Enthusiast

Jase -

Thats what I am afraid of. One last resort I am waiting on from VMware Engineering. We shall see....

Thanks everyone.

Charlie Gautreaux vExpert http://www.thickclouds.com
0 Kudos
Jasemccarty
Immortal
Immortal

The root of the issue was that I was running ESX 3.5 U3 on an IBM x440 (unsupported), and the firmware of the local disks didn't jive with U3.

I rebuilt the box with 3.5 U2, and didn't have the problem after that. Fortunately I don't have those x440's in production anymore.

Jase McCarty

http://www.jasemccarty.com

Co-Author: VMware ESX Essentials in the Virtual Data Center (ISBN:1420070274) Auerbach

Co-Author: VMware vSphere 4 Administration Instant Reference (ISBN:0470520728) Sybex

Please consider awarding points if this post was helpful or correct

Jase McCarty - Field SA at PureStorage - @jasemccarty
0 Kudos
Troy_Clavell
Immortal
Immortal

keep in mind it could be a hostd issue. If you get the right VMware TSE, they may be able to fix it without a reboot.

Good Luck!!

0 Kudos
marvinms
Enthusiast
Enthusiast

What backup software / process are you using?

This sounds very close to the issue I just had (other than the disconnected host, which was the only command available to one of the Jr. admins) because of the PhdVirtual esXpress 3.6.10 had a problem dealing with 2010.

0 Kudos
thickclouds
Enthusiast
Enthusiast

We don't use a backup agent unfortunately. I think we are stuck.

Charlie Gautreaux vExpert http://www.thickclouds.com
0 Kudos
danm66
Expert
Expert

No, HA won't restart mgmt. if you 'telnet hostnameorIP 443' and don't get a blank screen or any other kind of response, then things are looking really bad as far as not having to reboot vm's/host.

If you get a response on 443, you can try connecting directly to the host with the client.

At the physical/ILO/KVM screen, try alt-F3 or another F# key to see if you can get an alternate console to come up, too.

0 Kudos
timparkinsonShe
Enthusiast
Enthusiast

Just to echo what Jasemcarty mentioned. I had the exact same symptoms when the local filesystems went read only because of a raid controller fault. I happened to have an ssh session up when it went so I was able to do a bit of poking around. Not that it helped much though -the only solution was to remotely login to the machines, shut them down and bring them back on other hosts.

0 Kudos