VMware Cloud Community
mark_dormady
Contributor
Contributor

HELP HOST CRASHED!!! Core dump questions

I had a Host crash earlier. My gold support wont kick in until a few more hours. The server comes up. However, i cannot connect via Virtual center (Connection faild). I have esx 3.0. All my VM's are on Luns. What tricks can I use to get connected back to the virtual center? I am thinking a few services are not running.

0 Kudos
8 Replies
patk
Contributor
Contributor

Do you know whether the service console is accessible? For example, can you SSH to the server? If so see if the mgmt-vmware[/b] service is running by issuing the following command:

service mgmt-vmware status[/b]

If the service is not started try issuing:

service mgmt-vmware start[/b]

Also, in the event that you do need to open a case with vmware please be familiar with this article:

Collecting Diagnostic Information for VMware ESX Server Problems[/u]

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=653

0 Kudos
mark_dormady
Contributor
Contributor

Yes, I can SSH to the server. It appears to reboot normal. I did the following:\[root@hs01ap92 vmware]# service mgmt-vmware status

vmware-hostd is stopped

\[root@hs01ap92 vmware]#

\[root@hs01ap92 vmware]# service mgmt-vmware start

Starting VMware ESX Server Management services:

VMware ESX Server Host Agent (background) \[ OK ]

Availability report startup (background) \[ OK ]

\[root@hs01ap92 vmware]#

\[root@hs01ap92 vmware]# service mgmt-vmware status

vmware-hostd (pid 5001) is running...

I still cannot connect vie VC.

0 Kudos
Jazzer
Enthusiast
Enthusiast

Perhaps you need to disconnect the HOST from VC and then re-add.

0 Kudos
mark_dormady
Contributor
Contributor

Tried that...No luck. I get:

"Unable to access the specified host. The server software is not responding, or there is a network problem. Network shows this:

\[root@hs01ap92 vmware]# ifconfig -a

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:3006 errors:0 dropped:0 overruns:0 frame:0

TX packets:3006 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:561166 (548.0 Kb) TX bytes:561166 (548.0 Kb)

vmnic0 Link encap:Ethernet HWaddr 00:14:5E:1C:A8:54

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:138381 errors:0 dropped:0 overruns:0 frame:0

TX packets:91144 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:15309946 (14.6 Mb) TX bytes:99005124 (94.4 Mb)

Interrupt:105

vmnic1 Link encap:Ethernet HWaddr 00:14:5E:1C:A8:55

UP BROADCAST MULTICAST MTU:1500 Metric:1

RX packets:74079 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:10885093 (10.3 Mb) TX bytes:0 (0.0 b)

Interrupt:113

vswif0 Link encap:Ethernet HWaddr 00:50:56:49:42:3C

inet addr:10.2.1.41 Bcast:10.2.3.255 Mask:255.255.252.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:134572 errors:0 dropped:0 overruns:0 frame:0

TX packets:91134 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:13505737 (12.8 Mb) TX bytes:98920538 (94.3 Mb)

0 Kudos
patk
Contributor
Contributor

Could this possible be an issue with DNS? Have you tried adding the host to VC via IP address only? Also is there anything preventing traffic on UDP port 902 between your ESX host and VC box? As a test, see if you can manually enable that port in the firewall. The command you would be looking for is:

esxcfg-firewall[/b]

You can view available parameters using:

esxcfg-firewall --help[/b]

Good Luck,

Pat

0 Kudos
mark_dormady
Contributor
Contributor

I dont thik its a DNS problem. I had another Host go down earlier today. I may have a disk or firmware problem. /var/log messages keeps showing this:

watchdog-hostd: Terminating watchdog with PID 1806

May 1 10:39:57 hs01ap87 watchdog-hostd: \[1806] Signal received: exiting the watchdog

May 1 10:39:57 hs01ap87 /usr/lib/vmware/hostd/vmware-hostd\[1811]: Accepted password for user root from 127.0.0.1

May 1 10:39:58 hs01ap87 watchdog-vpxa: '/usr/sbin/vpxa' exited after 291 seconds

May 1 10:39:58 hs01ap87 watchdog-vpxa: Executing '/usr/sbin/vpxa'

May 1 10:39:58 hs01ap87 watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID not found

May 1 10:39:58 hs01ap87 VMware\[init]: \[2191] Begin '/usr/sbin/vmware-hostd -u -a', min-uptime = 60, max-quick-failures = 5, max-total-failures = 1000000

May 1 10:39:58 hs01ap87 VMware\[init]: connect: No such file or directory.

May 1 10:39:58 hs01ap87 VMware\[init]: connect: No such file or directory.

May 1 10:40:58 hs01ap87 /usr/lib/vmware/hostd/vmware-hostd\[2196]: Accepted password for user vpxuser from 127.0.0.1

May 1 11:01:02 hs01ap87 syslogd 1.4.1: restart.

May 1 11:01:07 hs01ap87 kernel: loop: loaded (max 8 devices)

THIS ONE:[/b]

May 1 11:01:07 hs01ap87 kernel: [b][b]EXT2-fs warning: maximal mount count reached, running e2fsck is recommended[/b][/b][/b][/b]

May 1 12:01:05 hs01ap87 last message repeated 2 times.

I set the interval to 100 (tune2fs -C 100 /dev/sda_devices). However, I still believe i have a problem.

0 Kudos
JonT
Enthusiast
Enthusiast

Mark, are you able to connect to the host directly using the VI client? I have a host that shows the "kernel: EXT2-fs" warning you are seeing also and I have no problems so far. I will get around to fixing that eventually. It almost sounds like your Virtual Center agent may be corrupted or not setup properly on your host. I remember reading a thread about deleting and recreating the vpxa a while ago but that would only apply if you remove the host from VC and then cannot add it back in.

0 Kudos
mark_dormady
Contributor
Contributor

That was my problem. I could SSH to the host but could'nt connect via VC. I removed it and tried to add it back. No luck. That makes sence about the agent. I would like to read up on that thread. However, since I was in a hurry to get the host back into the mix, I rebuilt the server and added it back into the cluster. Since all my VM's were on the SAN including the ".vmx" file, it was a quick and easy process. I still don't really know what caused it to crash. VMWARE support, so far, has not been any help at all. Its like asking a "Black Hole" for answers. Anyway, I believe I may be at full capasity. I am running 38 VM's on 12 CPU's. Most are SQL VM's (Not my choice). Thanks guys for you input,

0 Kudos