VMware Cloud Community
alsmaster
Enthusiast
Enthusiast

After rebooting ESX 4.0 no vSphere Client access for 15+ minutes?

It seems that in the first 15-20 minutes after a reboot, the ESX 4.0 host server is unreachable by the VI client. Also, running the vmware-cmd –l command from the console as root produces a similar error.

Thinking this was an issue with my custom kickstart file, I tried installing the server manually using the Virtual CDROM method. I’m still getting a delay of 15+ minutes before I can connect to the host with a VI

client after the host boots up. I have attached a screenshot of the error messages:

6016_6016.jpg

After 15 minutes both operations are successful. Anyone encounter this?

Reply
0 Kudos
33 Replies
bobby311
Enthusiast
Enthusiast

Aug 12 08:39:20 pitt-dev-vm-02 xinetd[2809]: START: vmware-authd pid=29543 from=127.0.0.1

Aug 12 08:39:20 pitt-dev-vm-02 xinetd[2809]: EXIT: vmware-authd status=255 pid=29543 duration=0(sec)

Aug 12 08:39:23 pitt-dev-vm-02 xinetd[2809]: START: vmware-authd pid=29597 from=127.0.0.1

Aug 12 08:39:23 pitt-dev-vm-02 xinetd[2809]: EXIT: vmware-authd status=255 pid=29597 duration=0(sec)

Aug 12 08:39:24 pitt-dev-vm-02 xinetd[2809]: START: vmware-authd pid=29598 from=127.0.0.1

Aug 12 08:39:24 pitt-dev-vm-02 xinetd[2809]: EXIT: vmware-authd status=255 pid=29598 duration=0(sec)

Aug 12 08:39:27 pitt-dev-vm-02 xinetd[2809]: START: vmware-authd pid=29656 from=127.0.0.1

Aug 12 08:39:27 pitt-dev-vm-02 xinetd[2809]: EXIT: vmware-authd status=255 pid=29656 duration=0(sec)

Aug 12 08:39:27 pitt-dev-vm-02 xinetd[2809]: START: vmware-authd pid=29657 from=127.0.0.1

Same errors 12 hours later...makes no sense, and still cannot connect host to VC

Here is what I did:

Installed vSphere on dev-vm-01

Created a VM on dev-vm-01, w2k8, VC, update manager

Added dev-vm-01 to dev-vc-01

Installed vSphere on dev-vm-02

Added dev-vm-02 to dev-vc-01

Ran update manager and applied the 17 updates to dev-vm-02

rebooted dev-vm-02

Now the xinetd errors keep coming in and I cannot join the server to VC.

any one else run in to this?

Reply
0 Kudos
toha
Enthusiast
Enthusiast

yes, I have.

There is definedly something wrong with vmware-authd process, after reboot it keeps dying under excessive amount of incoming connections. At some point it just starts working.

I'll need to open support case for this tomorrow.

Reply
0 Kudos
bobby311
Enthusiast
Enthusiast

Did you happen to update your host and then reboot it and then this started happening?

Reply
0 Kudos
toha
Enthusiast
Enthusiast

no, this is fresh install with no patches or modifications whatsoever.

Reply
0 Kudos
bobby311
Enthusiast
Enthusiast

so i finally fixed my issue with ESX host not connecting back to VC.

get ready to laugh.....

called in for support:

tech: what build are you running?

me: 140xxx

tech: the most current build is 160xxx.....

tech: where did you get this build?!?!?

Yea...so I was using the BETA copy of vSphere Smiley Sad Smiley Sad

so i suggest that everyone check their build number!

Reply
0 Kudos
DMcCoy
Contributor
Contributor

I'm on the latest build and I still can't access mine for 10+ minutes at boot. No external dns access for the severs here. Had the same issues on 3.0 and 3.5 (on different hardware).

Reply
0 Kudos
antivir
Contributor
Contributor

Guys, this is not bug but feature.

To connect via vSphere client or to connect to cluster Esx-server must resolve himself (only).

If you have any nameserver in resolv.conf - esx will try to resolve himself via that server regardless of /etc/hosts's content. If that dns-server is inaccessible or it hasn't records about esx - you get "503 Service Unavailable" during about 15 min. After that esx will "remember" of /etc/hosts.

Esx makes record about himself in /etc/hosts automatically every booting.

It's only about connect to esx.

For proper work HA you need:

1.

-nameserver in resolv.conf (nameserver must be accessible, contain all A and PTR records about all esx-servers in cluster)

or

-empty resolv.conf and records about all esx-servers in all /etc/hosts at all esx-servers in cluster.

2. ESX-server does not need any dns-records about vCenter.

3. HA works without vCenter.

TO proper work vCenter (DPM, DRS) you need:

  • accessible dns-server to resolve himself (vCenter) and to resolve all esx-servers in cluster (A and PTR)

or

-no specified dns-server field and all esx-servers in windows/system32/drivers/etc/hosts (don't remember to add full name of vCenter (like vcenter.domain.com), it need to resolve himself too)

P.S. If you use /etc/hosts-resolving and HA does not work, change all hosts records

192.168.100.115 esx1.domain.com

like this

192.168.100.115 esx1.domain.com esx1

Sorry for bad English.

WBR, valhalla

Reply
0 Kudos
duncanwannamake
Enthusiast
Enthusiast

I don't think this is a feature, as there is no benefit to a 15 minute delay in bringing up the core ESX services.

There seems to be some relation in my experience to the Dell Remote Access Controller providing virtual CD-ROM and Floppy drives to the system. When I detach virtual media option in the DRAC configuration then the issues goes away. Unfortunately, we use the DRAC virtual media as part of our setup automation so it is not a good solution for us to disable it's functionality. But this does fix the issue without having to add a working DNS server to resolv.conf. The other fix for us was to totally disable DNS resolution by clearing the contents of our /etc/resolv.conf. (Sledgehammer to the DNS, I hate doing this but it works)

I hope that VMware can release a proper fix to this issue, because the current workarounds are dirty hacks.

Reply
0 Kudos
gsandorx
Contributor
Contributor

I have an ESX 4.0 server (running on a HP ML 350 G5) in stand-alone mode (no vCenter). After updating to Update 1, I got the same error as everyone: the vmware-authd issue (as shown in /var/log/messages).

I follow the recomendations made here: I removed my DNS entries from /etc/resolv.conf and after rebooting I got it running. But I was a little further, I and double checked my config and I found that my ESX was unable to reach my DNS servers. I fix it, then, I restore my old /etc/resolv.conf and I got it running again.

Evidently, there is(are) some process that depends on the DNS that afects the vmware-authd daemon.

Cheers,

Sandor

Reply
0 Kudos
serrato01
Contributor
Contributor

this fixed my issue as well, thanks for that david!

z.b.

Reply
0 Kudos
RParker
Immortal
Immortal

All I can say this is normal. Maybe not great, but it's been doing this since vCenter 2.5 U3. I think VM Ware changed it so that the service is completely up, rather than allowing you to interact before it has a chance to complete housekeeping..

but the bigger question is WHY are you restarting so often that waiting 15 minutes EACH time is an issue.. that's where you SHOULD be focusing your attention. So it takes 15 minutes to come, big whoopie doo... You should ONLY be starting it ONCE, not so many times that this is an issue........

Reply
0 Kudos
RParker
Immortal
Immortal


I don't think this is a feature, as there is no benefit to a 15 minute delay in bringing up the core ESX services.

Says the user who is obviously NOT a programmer, and therefore doesn't understand that a service requires many BACKGROUND operations. the service STARTS, at that time THEN it begins to do work.. maybe it was changed to make the system more stable.... maybe?

At any rate, the bottom line is ESX host comes up once. So you wait 15 minutes. OK, so how is that a problem? You are restarting your host so many times.. and that's because . . . . . . .

There seems to be some relation in my experience to the Dell Remote Access Controller providing virtual CD-ROM and Floppy drives to the system.

First of all the DRAC has ZERO correlation with the system, the fact that you can access the machine, and gain console access is NOT any way related. ESX is an OS that runs, ESX has NO IDEA a DRAC even exists...

Reply
0 Kudos
cobra2497
Contributor
Contributor

Didn't read thru everything. But I got this same message. Make sure everything is correct in the resolve.conf. After correcting it you don't need to reboot just do "service network restart" and it will clear.

Reply
0 Kudos
disasteraverted
Contributor
Contributor

Thanks for posting this, David. I have three 710's in our environment and experienced this same issue on one of the three... must be that the other two didn't have this enabled by default (purchased at a different time).

Thanks!

Nick

Reply
0 Kudos