IKirill
Enthusiast
Enthusiast

Host disconnected from VC and vmware-watchdog: not found

Jump to solution

Hi!

I have host (esxi 6.0) that are disconnected from VC. I began to study the problem .

The service hostd don't start.

[root@esx-35:/var/run/vmware] /etc/init.d/hostd restart

watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID does not exist

watchdog-hostd: Unable to terminate watchdog: No running watchdog process for hostd

sh: you need to specify whom to kill

Ramdisk 'hostd' with estimated size of 1803MB already exists

[root@esx-35:/var/run/vmware] /opt/vmware/vpxa/bin/vmware-watchdog -r hostd

-sh: /opt/vmware/vpxa/bin/vmware-watchdog: not found

[root@esx-35:/var/run/vmware] /sbin/watchdog.sh -r hostd

nothing

[root@esx-35:/var/run/vmware] ls -l vmware-hostd.PID watchdog-hostd.PID

ls: watchdog-hostd.PID: No such file or directory

-rw-r--r--    1 root     root             8 Jan 15 13:46 vmware-hostd.PID

hostd.log

2019-01-15T10:00:16.218Z error hostd[676C1B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x667580a4, h:36, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)

2019-01-15T10:02:59.391Z error hostd[67CC4B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x6665fc6c, h:31, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)

2019-01-15T10:03:24.108Z error hostd[674BAB70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-22-71b7 user=root] GetPrimitiveParam: Cannot find (help)

2019-01-15T10:03:24.408Z error hostd[674FBB70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-a0-71cb user=root] GetPrimitiveParam: Cannot find (help)

2019-01-15T10:03:24.926Z error hostd[67CC4B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x67a3a74c, h:34, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)

2019-01-15T10:03:24.977Z error hostd[67CC4B70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-e7-71db user=root] GetPrimitiveParam: Cannot find (help)

2019-01-15T14:47:58.335Z warning -[FFA75B20] [Originator@6876 sub=Default] Estimated fds limit 4864 > 4096 max supported by setrlimit. Setting fds limit to 4096

2019-01-15T14:47:58.336Z warning hostd[FFA75B20] [Originator@6876 sub=Default] Unrecognized log/level '' using 'info'

2019-01-15T14:47:58.380Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Removing duplicate pools.xml entry 'resourcePool[0003]'

2019-01-15T14:47:58.380Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1'

2019-01-15T14:47:58.386Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277'

2019-01-15T14:47:58.386Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277/worldGroup.15702277'

I see this KB

https://kb.vmware.com/s/article/1005566

https://kb.vmware.com/s/article/1003490?1=

In my case i use LACP. In KB 1003490 i see this:

  • If LACP is enabled and configured, do not restart management services using services.sh command. Instead restart independent services using the /etc/init.d/module restart command.

I use "services.sh restart" command on this host and on others hosts, Other hosts are ok, but this host are gone crazy)

Ony ideas?

P.S. i cant reboot host.

0 Kudos
1 Solution

Accepted Solutions
IKirill
Enthusiast
Enthusiast

The problem is solved!

look at my hostd.log

2019-01-17T09:33:59.202Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Removing duplicate pools.xml entry 'resourcePool[0003]'

2019-01-17T09:33:59.203Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1'

2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277'

2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277/worldGroup.15702277'

error: Sysinfo error on operation returned status : Operation not permitted. Please see the VMkernel log for detailed error information

see the 1st line - Removing duplicate pools.xml entry 'resourcePool[0003]

Then see pools.xml file. Look at this

<resourcePool id="0002">

    <lastModified>2018-12-18T14:14:46.494756Z</lastModified>

    <name>EIS</name>

    <objID>pool1</objID>

    <path>host/user/pool2</path>

  </resourcePool>

  <resourcePool id="0003">

    <lastModified>2018-12-18T14:14:46.508533Z</lastModified>

    <name>zak.local</name>

    <objID>pool2</objID>

    <path>host/user/pool2</path>

The bold lines must be differnt! But not i my case, In my case i see this file on other host.

The zak.local resource pool on other hosts has 3 nesting:host & user & pool2 .

But the EIS pool on other hosts has 4 nests and on the problem host should be:

host/user/pool2/pool1.

Pool1 is taken from <objID> pool1 </ objID>.

Final fix:

<resourcePool id="0002">

    <lastModified>2018-12-18T14:14:46.494756Z</lastModified>

    <name>EIS</name>

    <objID>pool1</objID>

    <path>host/user/pool2/pool1</path>

Any way see other hosta to get logic of this file.

Then i restart vpxa and hostd and boom!! Host is alive!

Without reboot host!!!

View solution in original post

0 Kudos
3 Replies
kenbshinn
Enthusiast
Enthusiast

Have you tried removing it from vCenter and re-adding it? The host should remain online while this is happening, just make sure you know the Root password before you do this.

0 Kudos
IKirill
Enthusiast
Enthusiast

The problem is solved!

look at my hostd.log

2019-01-17T09:33:59.202Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Removing duplicate pools.xml entry 'resourcePool[0003]'

2019-01-17T09:33:59.203Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1'

2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277'

2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277/worldGroup.15702277'

error: Sysinfo error on operation returned status : Operation not permitted. Please see the VMkernel log for detailed error information

see the 1st line - Removing duplicate pools.xml entry 'resourcePool[0003]

Then see pools.xml file. Look at this

<resourcePool id="0002">

    <lastModified>2018-12-18T14:14:46.494756Z</lastModified>

    <name>EIS</name>

    <objID>pool1</objID>

    <path>host/user/pool2</path>

  </resourcePool>

  <resourcePool id="0003">

    <lastModified>2018-12-18T14:14:46.508533Z</lastModified>

    <name>zak.local</name>

    <objID>pool2</objID>

    <path>host/user/pool2</path>

The bold lines must be differnt! But not i my case, In my case i see this file on other host.

The zak.local resource pool on other hosts has 3 nesting:host & user & pool2 .

But the EIS pool on other hosts has 4 nests and on the problem host should be:

host/user/pool2/pool1.

Pool1 is taken from <objID> pool1 </ objID>.

Final fix:

<resourcePool id="0002">

    <lastModified>2018-12-18T14:14:46.494756Z</lastModified>

    <name>EIS</name>

    <objID>pool1</objID>

    <path>host/user/pool2/pool1</path>

Any way see other hosta to get logic of this file.

Then i restart vpxa and hostd and boom!! Host is alive!

Without reboot host!!!

View solution in original post

0 Kudos
irfan2dharma1
Contributor
Contributor

you have to run this command 2 times to simply fix the issue. 1st run will fail and the 2nd run will do the task

 

/etc/init.d/hostd restart

/etc/init.d/vpxa restart

0 Kudos