rmmorris
Contributor
Contributor

hostd not running on startup / 503 Server Unavailable / 503 Service Unavailable

Hi,

Our work is starting to look at VMware ESXi server and I thought it would be nice to play around with the technology at home.  I recently downloaded the free version of VMware ESXi 5.1 and installed it on a computer that I wasn't using.  The install went without an issue and rebooted to the standard yellow screen.  I checked the settings for the management interface and everything looked fine to me.

I downloaded the VMware vSphere Client, installed it and then tried to connect to the ESXi host.  I entered the IP address of the server, the root username and the password I selected during install.  When I clicked the Login button, it asked me about the certificate which I ignored and then I got this error message:

vsphere-error.jpg

I tried accessing the web interface by going to http://<server IP address> which redirected me to https://<server IP address> and again mentioned the certificate.  I accepted the warning and then was presented with a page that simply said:

503 Service Unavailable

Doing some digging, I used the console on the server and enabled SSH.  I used PuTTY to connect to the IP address of the host and accepted the certificate.  I used root as the username and the password I selected during install and was able to get to the command prompt.  There doesn't appear to be any issues with communicating with the ESXi host from a network point of view.

I started doing some research on 503 Service Unavailable and 503 Server Unavailable but the problems that I saw others having seemed to be more related to upgrades rather than a new install.  Some people were able to resolve their issues by restarting the management agents from the console under the Troubleshooting Mode Options.  I did that with no change.  I also rebooted the server many times and even reinstalled it several times.  The results always seemed to remain the same.

I started looking through the log files to see if I could see something obvious.  On one of the VMware KB articles, I was able to find out the command to check if the hostd was running.  It turns out in my case it wasn't:

~ # /etc/init.d/hostd status
hostd is not running.

I tried to manually stop hostd and start it:

~ # /etc/init.d/hostd stop
hostd is not running.
~ # /etc/init.d/hostd start
Unable to verify hostd started after 10 seconds
hostd started.

And here is what is logged in syslog.log:

2012-12-24T16:29:01Z watchdog-hostd: [10600] Begin 'hostd ++min=0,swapscope=system,group=hostd /etc/vmware/hostd/config.xml', min-uptime = 60, max-quick-failures = 1, max-total-failures = 1000000, bg_pid_file = ''
2012-12-24T16:29:01Z watchdog-hostd: Executing 'hostd ++min=0,swapscope=system,group=host/vim/vmvisor/hostd /etc/vmware/hostd/config.xml'
2012-12-24T16:29:01Z watchdog-hostd: 'hostd ++min=0,swapscope=system,group=host/vim/vmvisor/hostd /etc/vmware/hostd/config.xml' exited after 0 seconds (quick failure 1) 127
2012-12-24T16:29:01Z watchdog-hostd: Executing 'hostd ++min=0,swapscope=system,group=host/vim/vmvisor/hostd /etc/vmware/hostd/config.xml'
2012-12-24T16:29:02Z watchdog-hostd: 'hostd ++min=0,swapscope=system,group=host/vim/vmvisor/hostd /etc/vmware/hostd/config.xml' exited after 1 seconds (quick failure 2) 127
2012-12-24T16:29:02Z watchdog-hostd: End 'hostd ++min=0,swapscope=system,group=hostd /etc/vmware/hostd/config.xml', failure limit reached
2012-12-24T16:29:11Z watchdog-hostd: Unable to verify hostd started after 10 seconds
2012-12-24T16:30:01Z crond[2476]: crond: USER root pid 10855 cmd /sbin/hostd-probe

I'm not super familiar with ESXi 5.1 but it looks like to me that the watchdog-hostd says that it is trying to start hostd using the configuration file located at /etc/vmware/hostd/config.xml which then exists with "(quick failure 1) 127".  It then looks it tries to start hostd again and then again exists with "(quick failure 2) 127" which then quits with "failure limit reached".

The log file for /var/log/hostd.log is completely empty.  When I was researching hostd not starting, it appears for others there are error messages in this log file which helped guide them to a solution but in my case there is nothing.

I've attached the entire syslog.log file for the first startup while running these commands.  I've also attached the messages reported in syslog.log for just the part where I restarted the management agents.

I'd really like to try and get ESXi 5.1 up and running so that I can start playing around with it but I'm at a loss as to what I can try next to figure out why hostd fails to start everytime, even after a fresh install.  Can anyone suggest something that I can do to help either resolve the issue or at the least figure out what the problem might be?

Thank you.

11 Replies
JCMorrissey
Expert
Expert

Hi,

Take a look at:

http://communities.vmware.com/thread/421269

Also you got your DG configured correctly? relates to 4.x but

http://communities.vmware.com/message/1265174

Please consider marking as "helpful", if you find this post useful. Thanks!... http://johncmorrissey.wordpress.com/
0 Kudos
rmmorris
Contributor
Contributor

Hi,

I looked at the two links that you had provided.  For the first one, I did try resetting the System Configuration from dcui and when the ESXi host rebooted, it came back up with what appeared to be the same configuration along with the same issue.

Regarding the second link, I did check out my gateway and it appears to be correct.  I found a command to list the routes table and I was able to ping the gateway without any issues:

esxcfg-route Results.png

Ping Gateway.png

I was also able to ping Google and perform a traceroute:

ping.png

traceroute.png

The only thing with the traceroute that I wasn't sure of was hope 2 and hope 10 as they both had * * * instead of information.  I don't normally use traceroute so I'm not sure if that is normal or not.  I figure that since the ESXi host was able to get out on the internet then it wasn't a gateway issue.

I did some more searching and came across this command: esxcli network ip route ipv4 list

I thought it might be similar to the esxcfg-route command but when I ran it, I got this error message:

esxcli network Results.png

I don't know if the esxcli commands use a local service that might not be running similar to the issue that I am having with the hostd service.  I went back into dcui and ran the Test Management Network and something was different.  It failed the lookup check:

Test Management Network.png

In the past all three options reported OK.  I checked /etc/hosts and it looks fine to me:

hosts Output.png

Even though the dcui test failed to resolve the hostname localhost, I was still able to ping it and perform a nslookup on it both returning the correct results:

localhost Tests.png

I've done another re-install of the ESXi host just to make sure that me testing different things wasn't breaking anything and even with a fresh install I still get the 503 Server Unavailable and 503 Service Unavailable messages.

Is there a way to increase the log details to maybe shed some light as to what is going on?  Is there anything else I can try to see why this isn't working?

Thanks!

0 Kudos
JCMorrissey
Expert
Expert

Hi,

Just to rule it out did you drop the firewall ports?

# esxcli network firewall set --enabled false

and see if it re-occurs? know not elegant but just to preclude, can reboot it too and it should remain lifted all the ports.

Should see

# esxcli network firewall get
   Default Action: DROP
   Enabled: false
   Loaded: true

Please consider marking as "helpful", if you find this post useful. Thanks!... http://johncmorrissey.wordpress.com/
0 Kudos
rmmorris
Contributor
Contributor

Hi,

When I try to run both commands all I get is:

Connect to localhost failed: Connection failure

When I tried previous esxcli commands in the past, I believe they all returned the same error message above.  My guess is that esxcli commands must communicate with some kind of service that isn't running on the host maybe similar to hostd.  Is there another way to temporarily disable the firewall altogether?

Thanks.

0 Kudos
JCMorrissey
Expert
Expert

So in terms of networking, you using static or dhcp to pick up your management ip address?

Please consider marking as "helpful", if you find this post useful. Thanks!... http://johncmorrissey.wordpress.com/
0 Kudos
rmmorris
Contributor
Contributor

I've tried both ways.  DHCP is handled by my wireless router.  I've assigned a specific IP address for the MAC address of the network card that the management interface is tied to.  In both cases, I've used the same set of IP addresses for the management interface, gateway, and DNS server and the results are the same.

Last night I installed ESXi 5.1 under VMware Workstation 8 on my desktop computer to see if I would get the same results.  I did a straight forward install using the default DHCP configuration.  ESXi installed fine under the virtual machine and was assigned an IP address by my router.  I then used vSphere Client to connect to that IP address and I got the same certificate warning which I ignored but then I was logged in and could see the host.  So without making any changes to the default DHCP assigned values, everything worked.

I decided to remove the static address assignment from the DHCP server for my actual server and once again re-installed ESXi 5.1 on the physical server.  After the reboot from the install, I could see that the physical server had been assigned a different IP address from the DHCP address pool.  I tried to connect vSphere Client to the assigned IP address and got the same 503 Server Unavailable / 503 Service Unavailable error messages.  I compared the IP addresses between the physical server and the virtual server and they are identical with the exception of the IP addresses assigned to the management interfaces.

Today I checked to see if my motherboard had any BIOS updates and there were.  I downloaded the latest BIOS firmware and it installed it.  I also re-installed ESXi host one more time and got the exact same error messages.

I don't think it's network related at least in terms of infrastructure as the virtual machine was able to work on the very first attempt without a single change to the post-install setup.  I think it is related specifically to the computer I am trying to install ESXi 5.1 on but I don't know what the issue could be.  It installs fine to the hard drive.  I can access the host via SSH.  It recognizes all the memory installed in the machine.  It recognizes the processor.  It doesn't complain about any issues during the install.  It even appears to boot without incident to the black and orage display.

It looks like to me the hostd service just fails to start.  I looked through /var/log/syslog.log between the two ESXi installs to see if I see why one doesn't work and why the other one does.  Nothing really stands out to me but at the same time I'm not exactly sure what I am looking for.  I did check the area where hostd starts and I found this:

2013-01-01T16:40:56Z watchdog-hostd: [5091] Begin 'hostd ++min=0,swapscope=system,group=hostd /etc/vmware/hostd/config.xml', min-uptime = 60, max-quick-failures = 1, max-total-failures = 1000000, bg_pid_file = ''
2013-01-01T16:40:56Z watchdog-hostd: Executing 'hostd ++min=0,swapscope=system,group=host/vim/vmvisor/hostd /etc/vmware/hostd/config.xml'
2013-01-01T16:40:56Z watchdog-hostd: 'hostd ++min=0,swapscope=system,group=host/vim/vmvisor/hostd /etc/vmware/hostd/config.xml' exited after 0 seconds (quick failure 1) 127

In both installs, the first two lines were identical except for the date/time/process ID.  The difference is that the on the install that gives me the 503 errors, it shows that hostd exited after 0 seconds with "quick failure 1 127".  On the install that hostd fails to start I also noticed that vpxa also fails to start with this error message:

2013-01-01T16:41:12Z watchdog-vpxa: [5419] Begin '/usr/lib/vmware/vpxa/bin/vpxa ++min=0,swapscope=system,group=vpxa -D /etc/vmware/vpxa', min-uptime = 60, max-quick-failures = 1, max-total-failures = 1000000, bg_pid_file = ''

2013-01-01T16:41:12Z watchdog-vpxa: Executing '/usr/lib/vmware/vpxa/bin/vpxa ++min=0,swapscope=system,group=host/vim/vmvisor/vpxa -D /etc/vmware/vpxa'

2013-01-01T16:41:12Z watchdog-vpxa: '/usr/lib/vmware/vpxa/bin/vpxa ++min=0,swapscope=system,group=host/vim/vmvisor/vpxa -D /etc/vmware/vpxa' exited after 0 seconds (quick failure 1) 139

2013-01-01T16:41:12Z watchdog-vpxa: Executing '/usr/lib/vmware/vpxa/bin/vpxa ++min=0,swapscope=system,group=host/vim/vmvisor/vpxa -D /etc/vmware/vpxa'

2013-01-01T16:41:13Z watchdog-vpxa: '/usr/lib/vmware/vpxa/bin/vpxa ++min=0,swapscope=system,group=host/vim/vmvisor/vpxa -D /etc/vmware/vpxa' exited after 1 seconds (quick failure 2) 139

2013-01-01T16:41:13Z watchdog-vpxa: End '/usr/lib/vmware/vpxa/bin/vpxa ++min=0,swapscope=system,group=vpxa -D /etc/vmware/vpxa', failure limit reached

2013-01-01T16:41:22Z watchdog-vpxa: Unable to verify vpxa started after 10 seconds

I'm guessing that they are related some how but I am not sure what the relationship is between these two processes/services. 

Is there any way to increase the logging information around the hostd or vpxa to see if they may say why they exit?  I have tried searching to see what the 127 and 139 numbers reported in the log file might mean but haven't found anything.

Thanks.

0 Kudos
JCMorrissey
Expert
Expert

Hi,

One of the key things there that jumps out at me from the last post is "I could see that the physical server had been assigned a different IP address from the DHCP address pool." - is it give you one of them automatic IP address entries (APIPA) (169.*)?

If so it certainly isn't picking up your DHCP server.

If you install windows for example on the physical server be interesting to see if it returns same - if it doesn't (eg it returns an ip address from your DHCP pool) than it would be the load balancing scheme that would need looking at. You don't have LACP enabled on your switch?

Many tx

Please consider marking as "helpful", if you find this post useful. Thanks!... http://johncmorrissey.wordpress.com/
0 Kudos
rmmorris
Contributor
Contributor

What I meant by that was that the physical server received a different IP address from the DHCP pool than the virtual machine.  All other settings were the same (i.e. the gateway, DNS, etc.).  Both servers (the physical server and the virtual machine) received valid IP addresses from the DHCP server, addresses in the 192.168.0.1xx range.  Even though the physical servers returns the 503 error messages, I can still access the server via SSH when enabled in the console.  This is what makes this so frustrating because I can access the server over the network, I just can't access using the vSphere client or the web interface because they don't appear to be running on the host.

Thanks.

0 Kudos
rmmorris
Contributor
Contributor

Thanks JC for all of your help but unfortunately I have decided not to pursue trying to get ESXi running on my computer.  Without any additional "debug" information, there is just no way for me to know why some of the essential services are not running on startup.  I think my virtual server experiment shows that it isn't a network issue since the virtual server install of ESXi works fine.  It must be something to do with the actual computer but I have no idea what that might be.

Thanks again.

0 Kudos
ksk_jp
Contributor
Contributor

Hi,

I also had the same problem.

# cp /etc/vmware/hostd/config.xml /etc/vmware/hostd/config.xml.org

# cp /etc/vmware/hostd/.#config.xml /etc/vmware/hostd/config.xml

# chmod 644 /etc/vmware/hostd/config.xml

# services.sh restart

I was repaired now.

maliboo74
Contributor
Contributor

I had these same symptoms after updating from 5.0 u1 to 5.1 and the copy recommendation above from ksk_jp worked for me as well.  Thank you very much for sharing!

# cp /etc/vmware/hostd/config.xml /etc/vmware/hostd/config.xml.org

# cp /etc/vmware/hostd/.#config.xml /etc/vmware/hostd/config.xml

# chmod 644 /etc/vmware/hostd/config.xml

# services.sh restart