alsmaster
Enthusiast
Enthusiast

After rebooting ESX 4.0 no vSphere Client access for 15+ minutes?

It seems that in the first 15-20 minutes after a reboot, the ESX 4.0 host server is unreachable by the VI client. Also, running the vmware-cmd –l command from the console as root produces a similar error.

Thinking this was an issue with my custom kickstart file, I tried installing the server manually using the Virtual CDROM method. I’m still getting a delay of 15+ minutes before I can connect to the host with a VI

client after the host boots up. I have attached a screenshot of the error messages:

6016_6016.jpg

After 15 minutes both operations are successful. Anyone encounter this?

0 Kudos
33 Replies
glynnd1
Expert
Expert

What hardware are you installing on?

Anything in your kickstart file that is interesting / strange that could be mis-configuring things?

0 Kudos
jarsenea
Contributor
Contributor

I've had this on a brand spanking new clean install of ESX and opened a SR with VMware on the issue. Sent them the logs, but they could not figure it out. This was my problem:

The pNIC that the Service Console resided on was plugged into an Internal management LAN. At first glance, I didn't want to put the COS on our external LAN (you see, where I work, all machines plugged into the network have direct access to the Internet with their own externally accessible IP address). So I told myself I'd put the COS on something internal that coudn't be accessed from the outside (seemed like a good precaution - nobody has access to the internal management LAN).

When I moved the COS to a pNIC that was externally accessible, my problems went away.

I know this isn't much help, but hopefully it guides you in the right direction. Looking at my logs, there was absolutely nothing that indicated that a service failed to start upon bootup or some other issue that may cause the network to be brought up slowly. The only thing I can think of is ESX trying to access something outside (maybe the gateway or something else) and it keeps failing until it finally gives up and starts a service).

Let me know if you do end up figuring this out for yourself.

0 Kudos
duncan_wannamak
Contributor
Contributor

What hardware are you installing on?

Anything in your kickstart file that is interesting / strange that could be mis-configuring things?

The system is a Dell 1950 with a Perc 6/i controller for the local drives and a Q-Logic FC HBA. I eliminated the kickstart config and did a manual (graphical) install with DHCP and manual IP configurations and I get the same problem. The IP configuration seems fine because I can ssh into the server within 2 minutes of a reboot, but I'm unable to log in with the VI client for at least 15 minutes, sometimes 20. I looked through /var/log/messages and here is what I found during boot up:

Kernel finishes loading:

Aug 26 15:57:48 esx01 kernel: Netfilter messages via NETLINK v0.30.
Aug 26 15:57:48 esx01 kernel: ip_conntrack version 2.4 (1600 buckets, 12800 max) - 304 bytes per conntrack
Aug 26 15:57:48 esx01 watchdog-openwsmand: Begin '/sbin/openwsmand ++min=0,swap,group=wsman --syslog=3 --foreground-process', min-uptime = 60, max-qu
ick-failures = 5, max-total-failures = 1000000
Aug 26 15:57:48 esx01 watchdog-openwsmand: Executing '/sbin/openwsmand ++min=0,swap,group=wsman --syslog=3 --foreground-process'
Aug 26 15:57:48 esx01 kernel: i2c /dev entries driver

Errors begin:

Aug 26 15:57:49 esx01 xinetd[2751]: bind failed (Address already in use (errno = 98)). service = vmware-authd
Aug 26 15:57:49 esx01 xinetd[2751]: Service vmware-authd failed to start and is deactivated.
Aug 26 15:57:49 esx01 xinetd[2751]: bind failed (Address already in use (errno = 98)). service = vmware-authd-mks
Aug 26 15:57:49 esx01 xinetd[2751]: Service vmware-authd-mks failed to start and is deactivated.
Aug 26 15:57:49 esx01 xinetd[2751]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Aug 26 15:57:49 esx01 xinetd[2751]: Started working: 0 available services
Aug 26 15:57:49 esx01 watchdog-vobd: PID file /var/run/vmware/watchdog-vobd.PID not found
Aug 26 15:57:49 esx01 watchdog-vobd: Unable to terminate watchdog: Can't find process
Aug 26 15:57:49 esx01 watchdog-vobd: Begin '/usr/lib/vmware/bin/vmkload_app /usr/lib/vmware/bin/vobd', min-uptime = 60, max-quick-failures = 1, max-total-failures = 1000000
Aug 26 15:57:49 esx01 watchdog-vobd: Executing '/usr/lib/vmware/bin/vmkload_app /usr/lib/vmware/bin/vobd'
Aug 26 15:57:50 esx01 watchdog-vmklogger: PID file /var/run/vmware/watchdog-vmklogger.PID not found
Aug 26 15:57:50 esx01 watchdog-vmklogger: Unable to terminate watchdog: Can't find process


Aug 26 15:57:50 esx01 watchdog-vmklogger: Begin '/usr/sbin/vmklogger', min-uptime = 60, max-quick-failures = 1, max-total-failures = 1000000
Aug 26 15:57:50 esx01 watchdog-vmklogger: Executing '/usr/sbin/vmklogger'
Aug 26 15:57:50 esx01 watchdog-cdp: PID file /var/run/vmware/watchdog-cdp.PID not found
Aug 26 15:57:50 esx01 watchdog-cdp: Unable to terminate watchdog: Can't find process

Aug 26 15:58:02 esx01 watchdog-webAccess: PID file /var/run/vmware/watchdog-webAccess.PID not found
Aug 26 15:58:02 esx01 watchdog-webAccess: Unable to terminate watchdog: Can't find process
Aug 26 15:58:02 esx01 xinetd[2751]: START: vmware-authd pid=3449 from=127.0.0.1
Aug 26 15:58:02 esx01 xinetd[2751]: EXIT: vmware-authd status=255 pid=3449 duration=0(sec)
Aug 26 15:58:02 esx01 xinetd[2751]: START: vmware-authd pid=3450 from=127.0.0.1
Aug 26 15:58:02 esx01 xinetd[2751]: EXIT: vmware-authd status=255 pid=3450 duration=0(sec)
Aug 26 15:58:06 esx01 xinetd[2751]: START: vmware-authd pid=3522 from=127.0.0.1
Aug 26 15:58:06 esx01 xinetd[2751]: EXIT: vmware-authd status=255 pid=3522 duration=0(sec)
Aug 26 15:58:06 esx01 xinetd[2751]: START: vmware-authd pid=3523 from=127.0.0.1
Aug 26 15:58:06 esx01 xinetd[2751]: EXIT: vmware-authd status=255 pid=3523 duration=0(sec)
Aug 26 15:58:10 esx01 xinetd[2751]: START: vmware-authd pid=3599 from=127.0.0.1
Aug 26 15:58:10 esx01 xinetd[2751]: EXIT: vmware-authd status=255 pid=3599 duration=0(sec)
Aug 26 15:58:10 esx01 xinetd[2751]: START: vmware-authd pid=3600 from=127.0.0.1
Aug 26 15:58:10 esx01 xinetd[2751]: EXIT: vmware-authd status=255 pid=3600 duration=0(sec)
Aug 26 15:58:13 esx01 xinetd[2751]: START: vmware-authd pid=3668 from=127.0.0.1
Aug 26 15:58:13 esx01 xinetd[2751]: EXIT: vmware-authd status=255 pid=3668 duration=0(sec)
Aug 26 15:58:13 esx01 xinetd[2751]: START: vmware-authd pid=3669 from=127.0.0.1
Aug 26 15:58:13 esx01 xinetd[2751]: EXIT: vmware-authd status=255 pid=3669 duration=0(sec)
... 10 minutes of this output ...


Aug 26 16:10:16 esx01 xinetd[2751]: EXIT: vmware-authd status=0 pid=24497 duration=0(sec)
Aug 26 16:10:17 esx01 /usr/lib/vmware/bin/vmware-hostd[2946]: Accepted password for user root from 127.0.0.1
Aug 26 16:10:17 esx01 xinetd[2751]: START: vmware-authd pid=24557 from=127.0.0.1
Aug 26 16:10:17 esx01 xinetd[2751]: EXIT: vmware-authd status=0 pid=24557 duration=0(sec)

Also, in the /var/log/vmware/hostd.log I noticed the following:

2009-08-26 15:58:54.536 F637FB90 warning 'Proxysvc Req00002' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).
2009-08-26 15:58:54.545 F6442B90 warning 'Proxysvc Req00001' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).
2009-08-26 15:58:54.626 F6401B90 warning 'Proxysvc Req00003' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).
2009-08-26 15:58:54.640 F6401B90 warning 'Proxysvc Req00004' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).
2009-08-26 15:58:54.750 F6401B90 warning 'Proxysvc Req00005' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).
2009-08-26 15:58:54.757 F637FB90 warning 'Proxysvc Req00006' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).
2009-08-26 15:58:54.864 F63C0B90 warning 'Proxysvc Req00007' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).
2009-08-26 15:58:54.915 F6401B90 warning 'Proxysvc Req00010' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).

...5 minutes of this...

2009-08-26 16:03:52.740 F63C0B90 warning 'Proxysvc Req01193' Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).

after this last message the hostd.log finally begins starting services:

2009-08-26 16:08:13.037 F63C0B90 info 'ha-license-manager' Update: 86304 minutes left in eval period
2009-08-26 16:08:13.750 F5AF4B90 verbose 'DvsManager' PersistAllDvsInfo called
2009-08-26 16:10:14.490 F662E6D0 info 'Proxysvc' Plugin started
2009-08-26 16:10:14.498 F662E6D0 info 'Snmpsvc' ProcessConfig: SNMP Agent enabled: false, communities: 0, sinks 0
2009-08-26 16:10:14.498 F662E6D0 info 'Snmpsvc' Start: Not configured to run
2009-08-26 16:10:14.498 F662E6D0 info 'Solo' webServer/port: 8309
2009-08-26 16:10:14.498 F662E6D0 verbose 'WelcomePageCustomizer' Returned following links: LocalizedLink(), LocalizedLink()
2009-08-26 16:10:14.498 F662E6D0 verbose 'WelcomePageCustomizer' Created customizer with oem file '/etc/vmware/oem.xml' and dynDataUrl '/dyndata.js'.
2009-08-26 16:10:14.499 F662E6D0 info 'HTTP server' HTTP server created with docroots /usr/lib/vmware/hostd/docroot/
2009-08-26 16:10:14.499 F662E6D0 verbose 'HTTP server /host' HTTP accessible configuration files configuration: /etc/vmware/hostd/webAccessibleConfigFiles.xml
2009-08-26 16:10:14.508 F662E6D0 info 'Solo' soapPort: 8307
2009-08-26 16:10:14.508 F662E6D0 verbose 'Solo' Registering AuthMgr listener
2009-08-26 16:10:14.511 F662E6D0 verbose 'App' Looking up object with name = "HaNetwork-VM Network" failed.
2009-08-26 16:10:14.511 F662E6D0 verbose 'App' Looking up object with name = "ha-view-mgr-impl" failed.
2009-08-26 16:10:14.511 F662E6D0 verbose 'App' Registered network VM Network
2009-08-26 16:10:14.511 F662E6D0 info 'Solo' VM inventory configuration: /etc/vmware/hostd/vmInventory.xml

I'm going to be installing ESX4 on my IBM laptop to see if a totally different hardware environment makes a difference. Will update as I find out more.

0 Kudos
sevo77
Contributor
Contributor

I`ve exactlly the same problem. 15+ no access throught Clinet. The hardware is Supermicro CSBC-Q, Core2 Duo E8400 +8GB SuperTallent CL4. Can somebody help?

0 Kudos
wenlaoshi
Contributor
Contributor

I had the same problem. The problem was solved by changing /etc/hosts like this

127.0.0.1 host.name.com host

#10.1.2.3 host.name.com

funny

0 Kudos
Debcha
Contributor
Contributor

Hi,

i have exactly the same problem with ESX4 as a VMware Workstation 6.5.2 - VM.

@ wenlaoshi

unfortunately your solution doesn't solve this for me. By the way if you change something in /etc/hosts the entries are changed/overwritten after every host reboot !?

It's very annoying because i want to demonstrate vSphere to customers (vSphere LAB on a laptop), but with this issue it's not possible.

The same problem here:

http://communities.vmware.com/message/1266874

http://communities.vmware.com/message/1294240

http://communities.vmware.com/thread/219256

Please VMware solve this big issue or publish a workaround.

0 Kudos
andosm
Enthusiast
Enthusiast

Hello,

do you have any update on this problem? I also have it in my LAB, but only on ESX 4.0 VM's. ESXi 4.0 VM's are ok.

--

Best Regards

Andrzej

Pozdrawiam/Best Regards Andrzej
0 Kudos
wenlaoshi
Contributor
Contributor

This is not really a solution - let's call it a workaround:

After deleting all entries in /etc/resolv.conf the problem disappears in my installation.

Please tell me if this solves the problem in your cas too.

0 Kudos
ShaneWendel
Enthusiast
Enthusiast

I have the same situation (my COS is on a private out-of-band VLAN) and the same symptom.

-


Shane Wendel, VCP

----------------- Shane Wendel VCP: vSphere 4 VCP: VI3 http://fatalsync.wordpress.com
0 Kudos
CSMWAF
Contributor
Contributor

Got exactly the same problem.

in the /var/log/messages you see:

Jul 22 09:21:25 ktesx02 xinetd[2438]: EXIT: vmware-authd status=255 pid=3106 duration=1(sec)

Jul 22 09:21:27 ktesx02 xinetd[2438]: START: vmware-authd pid=3180 from=127.0.0.1

Jul 22 09:21:28 ktesx02 xinetd[2438]: EXIT: vmware-authd status=255 pid=3180 duration=1(sec)

Jul 22 09:21:28 ktesx02 xinetd[2438]: START: vmware-authd pid=3182 from=127.0.0.1

Jul 22 09:21:28 ktesx02 xinetd[2438]: EXIT: vmware-authd status=255 pid=3182 duration=0(sec)

Jul 22 09:21:31 ktesx02 xinetd[2438]: START: vmware-authd pid=3256 from=127.0.0.1

Jul 22 09:21:33 ktesx02 xinetd[2438]: EXIT: vmware-authd status=255 pid=3256 duration=2(sec)

Jul 22 09:21:33 ktesx02 xinetd[2438]: START: vmware-authd pid=3266 from=127.0.0.1

Jul 22 09:21:33 ktesx02 xinetd[2438]: EXIT: vmware-authd status=255 pid=3266 duration=0(sec)

after deleteing all entries in resolve.conf and hosts it was working instandly after a reeboot

0 Kudos
andosm
Enthusiast
Enthusiast

It looks like your workaround works for me.

I removed all "nameserver" entries from resolv.conf. I only left "domain" entry.

After reboot I have only few entries like below in /var/log/messages

Jul 22 21:26:49 esx401 xinetd[2164]: EXIT: vmware-authd status=255 pid=4848 duration=0(sec)

Jul 22 21:26:50 esx401 xinetd[2164]: START: vmware-authd pid=4850 from=127.0.0.1

Jul 22 21:26:50 esx401 xinetd[2164]: EXIT: vmware-authd status=255 pid=4850 duration=0(sec)

Jul 22 21:26:53 esx401 xinetd[2164]: START: vmware-authd pid=5121 from=127.0.0.1

Jul 22 21:26:54 esx401 xinetd[2164]: EXIT: vmware-authd status=255 pid=5121 duration=1(sec)

Jul 22 21:26:54 esx401 xinetd[2164]: START: vmware-authd pid=5122 from=127.0.0.1

Jul 22 21:26:54 esx401 xinetd[2164]: EXIT: vmware-authd status=255 pid=5122 duration=0(sec)

Jul 22 21:26:57 esx401 xinetd[2164]: START: vmware-authd pid=5180 from=127.0.0.1

Jul 22 21:26:57 esx401 xinetd[2164]: EXIT: vmware-authd status=0 pid=5180 duration=0(sec)

they end with

Jul 22 21:26:58 esx401 /usr/lib/vmware/bin/vmware-hostd[2384]: Accepted password for user root from 127.0.0.1

and everything works as expected.

Thanks for this!

Best Regards

Andrzej

Message was edited by: andosm

Pozdrawiam/Best Regards Andrzej
0 Kudos
CSMWAF
Contributor
Contributor

in my case i don't use any HA or vmotion.

How are you doing these features?

Enter all other servers to the hosts file?

0 Kudos
andosm
Enthusiast
Enthusiast

I have 3 hosts in my lab. 2 x ESXi 4.0 and 1 x ESX 4.0. They are all VM's on ONE physical box with ESXi 4.0 Smiley Happy

I don't use DNS. I have all hosts in vCenter hosts file.

On each ESX hosts file there's only host IP and vCenter host IP address. Host IP is added to hosts file during install/setup, vCenter host IP addres is added during adding ESX host to vCenter Server.

HA and VMotion features works without problem. I have plans to test FT.

--

Best Regards

Andrzej

Pozdrawiam/Best Regards Andrzej
0 Kudos
DaviuS
Contributor
Contributor

Same problem here, with a clean ESX 4 install on Dell PowerEdge R710 hardware.

I also have the same symptoms for 10 or 15 minutes and these lines in the event log :

Jul 23 16:56:04 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: START: vmware-authd pid=17364 from=127.0.0.1
Jul 23 16:56:04 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: EXIT: vmware-authd status=255 pid=17364 duration=0(sec)
Jul 23 16:56:04 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: START: vmware-authd pid=17365 from=127.0.0.1
Jul 23 16:56:04 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: EXIT: vmware-authd status=255 pid=17365 duration=0(sec)
Jul 23 16:56:07 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: START: vmware-authd pid=17419 from=127.0.0.1
Jul 23 16:56:07 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: EXIT: vmware-authd status=255 pid=17419 duration=0(sec)
Jul 23 16:56:07 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: START: vmware-authd pid=17420 from=127.0.0.1
Jul 23 16:56:07 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: EXIT: vmware-authd status=255 pid=17420 duration=0(sec)
Jul 23 16:56:09 aaiparisesx1 kernel: http:// 788.255313 sd 6:0:4:0: scsi: Removable device failed error recovery - setting offline.
* Jul 23 16:56:09 aaiparisesx1 kernel: http:// 788.265803 sd 6:0:4:0: still retrying 477 after 180s*
* Jul 23 16:56:09 aaiparisesx1 kernel: http:// 788.271067 sd 6:0:4:0: rejecting I/O to offline device*
Jul 23 16:56:09 aaiparisesx1 sfcb[5196|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=5196]: VICimInstBuilder create failed
Jul 23 16:56:14 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: START: vmware-authd pid=18495 from=127.0.0.1
Jul 23 16:56:14 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: EXIT: vmware-authd status=0 pid=18495 duration=0(sec)
Jul 23 16:56:14 aaiparisesx1 /usr/lib/vmware/bin/vmware-hostd[3623|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=3623]: Accepted password for user root from 127.0.0.1
Jul 23 16:56:15 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: START: vmware-authd pid=18692 from=127.0.0.1
Jul 23 16:56:15 aaiparisesx1 xinetd[2810|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=2810]: EXIT: vmware-authd status=0 pid=18692 duration=0(sec)
Jul 23 16:56:15 aaiparisesx1 /usr/lib/vmware/bin/vmware-hostd[3623|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=3623]: Accepted password for user root from 127.0.0.1
Jul 23 16:56:24 aaiparisesx1 cimslp: SLP data collection finished
Jul 23 16:56:30 aaiparisesx1 /usr/lib/vmware/bin/vmware-hostd[3623|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=3623]: Accepted password for user vpxuser from 127.0.0.1
Jul 23 16:56:53 aaiparisesx1 last message repeated 2 times
Jul 23 17:01:01 aaiparisesx1 syslogd 1.4.1: restart.
Jul 23 17:22:35 aaiparisesx1 /usr/lib/vmware/bin/vmware-hostd[3623|http://communities.vmware.com/community-document-picker.jspa?communityID=&subject=3623]: Accepted password for user root from 127.0.0.1

Looking at the vmkernel log, it seems that ESX is probing for USB storages while the problem appends :

Jul 23 16:45:34 aaiparisesx1 vmkernel: 0:00:02:34.705 cpu1:4236)usb storage warning (0 throttled) on vmhba35 (SCSI cmd MODE_SENSE): clearing endpoint halt for pipe 0xc0008580
Jul 23 16:45:34 aaiparisesx1 vmkernel: usb storage message on vmhba35: Bulk data transfer result 0x0
Jul 23 16:45:34 aaiparisesx1 vmkernel: usb storage message on vmhba35: scsi cmd done, result=0x0
Jul 23 16:45:34 aaiparisesx1 vmkernel: usb storage message on vmhba35: Bulk command transfer result=0
Jul 23 16:45:34 aaiparisesx1 vmkernel: 0:00:02:34.706 cpu1:4097)NMP: nmp_CompleteCommandForPath: Command 0x1a (0x4100060a0440) to NMP device "mpx.vmhba35:C0:T0:L0" failed on physical path "vmhba35:C0:T0:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jul 23 16:45:34 aaiparisesx1 vmkernel: 0:00:02:34.706 cpu1:4097)ScsiDeviceIO: 747: Command 0x1a to device "mpx.vmhba35:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jul 23 16:45:34 aaiparisesx1 vmkernel: 0:00:02:34.723 cpu0:4096)VMNIX: VmkDev: 2249: Added SCSI device vml0:4:0 (mpx.vmhba35:C0:T0:L0)
Jul 23 16:45:34 aaiparisesx1 vmkernel: 0:00:02:35.168 cpu1:4097)NMP: nmp_CompleteCommandForPath: Command 0x12 (0x41000610f800) to NMP device "mpx.vmhba35:C0:T0:L0" failed on physical path "vmhba35:C0:T0:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jul 23 16:45:34 aaiparisesx1 vmkernel: 0:00:02:35.168 cpu1:4097)ScsiDeviceIO: 747: Command 0x12 to device "mpx.vmhba35:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jul 23 16:45:34 aaiparisesx1 vmkernel: 0:00:02:35.171 cpu1:4097)NMP: nmp_CompleteCommandForPath: Command 0x12 (0x41000610f600) to NMP device "mpx.vmhba35:C0:T0:L0" failed on physical path "vmhba35:C0:T0:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jul 23 16:45:34 aaiparisesx1 vmkernel: 0:00:02:35.171 cpu1:4097)ScsiDeviceIO: 747: Command 0x12 to device "mpx.vmhba35:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
Jul 23 16:45:35 aaiparisesx1 vmkernel: 0:00:02:35.675 cpu1:4182)VMK_PCI: 728: device 005:00.0 capType 16 capIndex 76
Jul 23 16:45:35 aaiparisesx1 vmkernel: 0:00:02:35.708 cpu3:4181)VMK_PCI: 728: device 005:00.1 capType 16 capIndex 76
Jul 23 16:45:35 aaiparisesx1 vmkernel: 0:00:02:36.240 cpu3:4198)<6>usb 2-3: USB disconnect, address 5
Jul 23 16:45:35 aaiparisesx1 vmkernel: 0:00:02:36.240 cpu3:4198)usb storage warning (0 throttled) on unknown (SCSI cmd unknown): usb_stor_stop_transport called
Jul 23 16:45:35 aaiparisesx1 vmkernel: 0:00:02:36.240 cpu3:4198)WARNING: Removing Host Adapter vmhba35
Jul 23 16:45:35 aaiparisesx1 vmkernel: 0:00:02:36.252 cpu1:4184)VMK_PCI: 728: device 005:00.0 capType 16 capIndex 76
Jul 23 16:45:36 aaiparisesx1 vmkernel: 0:00:02:36.393 cpu1:4186)VMK_PCI: 728: device 005:00.1 capType 16 capIndex 76
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.621 cpu0:4096)CosCommon: 2224: irq 4 vector 0x39 (host 0x49)
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.621 cpu0:4096)IDT: 1038: 0x39 for host
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.621 cpu0:4096)CosCommon: 2156: irq 4 vector 0x39 (host 0x49)
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.622 cpu0:4096)CosCommon: 2224: irq 3 vector 0x31 (host 0x41)
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.622 cpu0:4096)IDT: 1038: 0x31 for host
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.622 cpu0:4096)CosCommon: 2156: irq 3 vector 0x31 (host 0x41)
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.623 cpu0:4096)WARNING: NMP: nmp_IssueCommandToDevice: I/O could not be issued to device "mpx.vmhba33:C0:T0:L0" due to Not found
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.623 cpu0:4096)WARNING: NMP: nmp_DeviceRetryCommand: Device "mpx.vmhba33:C0:T0:L0": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
Jul 23 16:45:39 aaiparisesx1 vmkernel: 0:00:02:39.623 cpu0:4096)WARNING: NMP: nmp_DeviceStartLoop: NMP Device "mpx.vmhba33:C0:T0:L0" is blocked. Not starting I/O from device.
Jul 23 16:45:40 aaiparisesx1 vmkernel: 0:00:02:40.624 cpu0:4199)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "mpx.vmhba33:C0:T0:L0" - issuing command 0x41000610d200
Jul 23 16:45:40 aaiparisesx1 vmkernel: 0:00:02:40.624 cpu0:4199)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "mpx.vmhba33:C0:T0:L0" - failed to issue command due to Not found (APD), try again...
Jul 23 16:45:40 aaiparisesx1 vmkernel: 0:00:02:40.624 cpu0:4199)WARNING: NMP: nmp_DeviceAttemptFailover: Logical device "mpx.vmhba33:C0:T0:L0": awaiting fast path state update...
Jul 23 16:46:09 aaiparisesx1 vmkernel: 0:00:03:09.618 cpu0:4096)VMNIX: VmkDev: 2873: a/r=2 cmd=0x0 sn=456 dsk=vml0:2:0 reqbuf=0000000000000000 (sg=0)
Jul 23 16:46:09 aaiparisesx1 vmkernel: 0:00:03:09.618 cpu1:4105)ScsiDeviceIO: 747: Command 0x0 to device "mpx.vmhba33:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Jul 23 16:46:09 aaiparisesx1 vmkernel: 0:00:03:09.618 cpu1:4105)WARNING: NMP: nmp_DeviceStartLoop: NMP Device "mpx.vmhba33:C0:T0:L0" is blocked. Not starting I/O from device.
Jul 23 16:46:09 aaiparisesx1 vmkernel: 0:00:03:09.618 cpu0:4096)VMNIX: VmkDev: 2918: abort sn=456, vmkret=0.
Jul 23 16:46:09 aaiparisesx1 vmkernel: 0:00:03:09.624 cpu2:4199)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world restore device "mpx.vmhba33:C0:T0:L0" - no more commands to retry
Jul 23 16:46:14 aaiparisesx1 vmkernel: 0:00:03:14.618 cpu0:4096)VMNIX: VmkDev: 2966: abort succeeded.
Jul 23 16:46:14 aaiparisesx1 vmkernel: 0:00:03:14.618 cpu0:4096)WARNING: NMP: nmp_IssueCommandToDevice: I/O could not be issued to device "mpx.vmhba33:C0:T0:L0" due to Not found
Jul 23 16:46:14 aaiparisesx1 vmkernel: 0:00:03:14.618 cpu0:4096)WARNING: NMP: nmp_DeviceRetryCommand: Device "mpx.vmhba33:C0:T0:L0": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
Jul 23 16:46:14 aaiparisesx1 vmkernel: 0:00:03:14.618 cpu0:4096)WARNING: NMP: nmp_DeviceStartLoop: NMP Device "mpx.vmhba33:C0:T0:L0" is blocked. Not starting I/O from device.
Jul 23 16:46:44 aaiparisesx1 vmkernel: 0:00:03:44.618 cpu0:4096)VMNIX: VmkDev: 2873: a/r=2 cmd=0x0 sn=461 dsk=vml0:2:0 reqbuf=0000000000000000 (sg=0)
Jul 23 16:46:44 aaiparisesx1 vmkernel: 0:00:03:44.618 cpu1:4105)ScsiDeviceIO: 747: Command 0x0 to device "mpx.vmhba33:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Jul 23 16:46:44 aaiparisesx1 vmkernel: 0:00:03:44.618 cpu1:4105)WARNING: NMP: nmp_DeviceStartLoop: NMP Device "mpx.vmhba33:C0:T0:L0" is blocked. Not starting I/O from device.
Jul 23 16:46:44 aaiparisesx1 vmkernel: 0:00:03:44.618 cpu0:4096)VMNIX: VmkDev: 2918: abort sn=461, vmkret=0.
Jul 23 16:46:44 aaiparisesx1 vmkernel: 0:00:03:44.618 cpu3:4199)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world restore device "mpx.vmhba33:C0:T0:L0" - no more commands to retry
Jul 23 16:46:49 aaiparisesx1 vmkernel: 0:00:03:49.618 cpu0:4096)VMNIX: VmkDev: 2966: abort succeeded.
Jul 23 16:46:49 aaiparisesx1 vmkernel: 0:00:03:49.618 cpu0:4096)WARNING: NMP: nmp_IssueCommandToDevice: I/O could not be issued to device "mpx.vmhba33:C0:T0:L0" due to Not found
Jul 23 16:46:49 aaiparisesx1 vmkernel: 0:00:03:49.618 cpu0:4096)WARNING: NMP: nmp_DeviceRetryCommand: Device "mpx.vmhba33:C0:T0:L0": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
Jul 23 16:46:49 aaiparisesx1 vmkernel: 0:00:03:49.618 cpu0:4096)WARNING: NMP: nmp_DeviceStartLoop: NMP Device "mpx.vmhba33:C0:T0:L0" is blocked. Not starting I/O from device.
Jul 23 16:46:50 aaiparisesx1 vmkernel: 0:00:03:50.556 cpu2:4098)<6>megasas_service_aen[2]: aen received
Jul 23 16:46:50 aaiparisesx1 vmkernel: 0:00:03:50.556 cpu3:4190)<6>megasas_hotplug_work[2]: event code 0x00f2
Jul 23 16:46:50 aaiparisesx1 vmkernel: 0:00:03:50.556 cpu3:4190)<6>megasas_hotplug_work[2]: aen registered

I will try to disable USB modules to see if it helps.

David,

http://www.davius.net

David, http://www.davius.net
0 Kudos
DaviuS
Contributor
Contributor

I finally solved my problem by disabling Dell Server Unified Configurator in iDRAC configuration.

It seems that the service console was trying to access storage used by this component on boot (maybe a CD-Rom emulator used by automatic OS deployment). Before this attempt timed-out, the ESX server is unavaible.

Disabling USB in the BIOS didn't helped. I've also tried a esxcfg-module -d usb-storage, without success... the COS still trying to access usb datastores.

For people with HP hardware, try to disable iLO.

I hope this helps.

David,

David, http://www.davius.net
0 Kudos
hkitagawa
Contributor
Contributor

Hi,

I had the same problem.

In my case,I solved.

I have 3hosts ESX4.0 on IBM x3650,vCenter on IBM x3550,

and DNS server work "bind" of RHEL4.4 on IBM x326m in my lab.

ESX4.0 and vCenter connect DNS server to resolve host names.

My lab cannot connect to internet,Closed LAN.

So,DNS cannot resolve recursive name.(cannot connect root of DNS)

in case above,VIClient cannnot connect about 15min to ESX4.0,

and vCenter too.

after about 15min,VIClient and vCenter can connect to ESX4.0.

Message below is /var/log/messaseg

ESX01 xinetd2810: START: vmware-authd pid=17419 from=127.0.0.1

ESX01 xinetd2810: EXIT: vmware-authd status=255 pid=17419 duration=0(sec)

ESX01 xinetd2810: START: vmware-authd pid=17420 from=127.0.0.1

ESX01 xinetd2810: EXIT: vmware-authd status=255 pid=17420 duration=0

-- this message logd about 15min per 3sec --

ESX01 xinetd2810: START: vmware-authd pid=17420 from=127.0.0.1

ESX01 xinetd2810: EXIT: vmware-authd status=0

ESX01 /usr/lib/vmware/bin/vmware-hostd3623: Accepted password for user vpxuser from 127.0.0.1

-- this message logd,I can connect to ESX4.0 --

DNS Setting of bind "named.conf".

options {

recursion yes;

}

I changed option of recursion "no",and daemon "named" restart.

Then,I can connect to ESX4.0 soon after booting(rebooting).

Message below is /var/log/messaseg

ESX01 xinetd2810: START: vmware-authd pid=17420 from=127.0.0.1

ESX01 xinetd2810: EXIT: vmware-authd status=0

ESX01 /usr/lib/vmware/bin/vmware-hostd3623: Accepted password for user vpxuser from 127.0.0.1

-- this message logd soon after booting,I can connetd to ESX4.0 --

I have not trying on Windows Server DNS.

If your DNS cannot connect to internet,try it please!

From japanese

0 Kudos
bobby311
Enthusiast
Enthusiast

DNS Setting of bind "named.conf".

options {

recursion yes;

}

I changed option of recursion "no",and daemon "named" restart.

Where is this named.conf file located?

much thanks!

0 Kudos
RParker
Immortal
Immortal

Well at first I thought this was a problem, but now I don't think it is. First of all you have to think server, lots of services, things going on in the background so server itself takes a few minutes even after a boot to completely be ready.

Virtual Center you have to consider is querying ALL the ESX hosts, just because you reboot and it takes 60 seconds, there is no guarantee software has any visibility into how long things take, so a reboot is assumed to be a cold start, so the service starts, NOW vCenter starts to interrogate the ESX hosts for inventory. The data is kept in the SQL database but you can't assume this data is accurate, because of the time lapse. So I don't think waiting 10 or 15 minutes for Virtual Center to query the servers is an extraordinary time to wait, especially when you think how many times are you really going to reboot virtual center?

The idea is to keep Virtual Center running and don't reboot, then when you do reboot take into account that server and services need to run to be FULLY active. I really don't see this as a problem.

0 Kudos
RParker
Immortal
Immortal

I have 3hosts ESX4.0 on IBM x3650,vCenter on IBM x3550,

If it works it works, if it doesn't it doesn't, I wouldn't go changing settings on ESX unless you know what you are doing, and since in our case it eventually does start, I think this is just virtual center needing time to actually get the information from the hosts, that's normal process, different from before. This Virtual Center is MUCH less prone to disconnects and it survives even small network glitches, the VC 2.5 did not, so I will take it.

VC 4.0 is better and this to me is just the way it works, normal functions.

0 Kudos