Hi!
I have host (esxi 6.0) that are disconnected from VC. I began to study the problem .
The service hostd don't start.
[root@esx-35:/var/run/vmware] /etc/init.d/hostd restart
watchdog-hostd: PID file /var/run/vmware/watchdog-hostd.PID does not exist
watchdog-hostd: Unable to terminate watchdog: No running watchdog process for hostd
sh: you need to specify whom to kill
Ramdisk 'hostd' with estimated size of 1803MB already exists
[root@esx-35:/var/run/vmware] /opt/vmware/vpxa/bin/vmware-watchdog -r hostd
-sh: /opt/vmware/vpxa/bin/vmware-watchdog: not found
[root@esx-35:/var/run/vmware] /sbin/watchdog.sh -r hostd
nothing
[root@esx-35:/var/run/vmware] ls -l vmware-hostd.PID watchdog-hostd.PID
ls: watchdog-hostd.PID: No such file or directory
-rw-r--r-- 1 root root 8 Jan 15 13:46 vmware-hostd.PID
hostd.log
2019-01-15T10:00:16.218Z error hostd[676C1B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x667580a4, h:36, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)
2019-01-15T10:02:59.391Z error hostd[67CC4B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x6665fc6c, h:31, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)
2019-01-15T10:03:24.108Z error hostd[674BAB70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-22-71b7 user=root] GetPrimitiveParam: Cannot find (help)
2019-01-15T10:03:24.408Z error hostd[674FBB70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-a0-71cb user=root] GetPrimitiveParam: Cannot find (help)
2019-01-15T10:03:24.926Z error hostd[67CC4B70] [Originator@6876 sub=SoapAdapter.HTTPService.HttpConnection] Failed to read header on stream <io_obj p:0x67a3a74c, h:34, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>: N7Vmacore15SystemExceptionE(Connection reset by peer)
2019-01-15T10:03:24.977Z error hostd[67CC4B70] [Originator@6876 sub=Solo.VmwareCLI opID=esxcli-e7-71db user=root] GetPrimitiveParam: Cannot find (help)
2019-01-15T14:47:58.335Z warning -[FFA75B20] [Originator@6876 sub=Default] Estimated fds limit 4864 > 4096 max supported by setrlimit. Setting fds limit to 4096
2019-01-15T14:47:58.336Z warning hostd[FFA75B20] [Originator@6876 sub=Default] Unrecognized log/level '' using 'info'
2019-01-15T14:47:58.380Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Removing duplicate pools.xml entry 'resourcePool[0003]'
2019-01-15T14:47:58.380Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1'
2019-01-15T14:47:58.386Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277'
2019-01-15T14:47:58.386Z warning hostd[FFA75B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277/worldGroup.15702277'
I see this KB
https://kb.vmware.com/s/article/1005566
https://kb.vmware.com/s/article/1003490?1=
In my case i use LACP. In KB 1003490 i see this:
I use "services.sh restart" command on this host and on others hosts, Other hosts are ok, but this host are gone crazy)
Ony ideas?
P.S. i cant reboot host.
The problem is solved!
look at my hostd.log
2019-01-17T09:33:59.202Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Removing duplicate pools.xml entry 'resourcePool[0003]'
2019-01-17T09:33:59.203Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1'
2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277'
2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277/worldGroup.15702277'
error: Sysinfo error on operation returned status : Operation not permitted. Please see the VMkernel log for detailed error information
see the 1st line - Removing duplicate pools.xml entry 'resourcePool[0003]
Then see pools.xml file. Look at this
<resourcePool id="0002">
<lastModified>2018-12-18T14:14:46.494756Z</lastModified>
<name>EIS</name>
<objID>pool1</objID>
<path>host/user/pool2</path>
</resourcePool>
<resourcePool id="0003">
<lastModified>2018-12-18T14:14:46.508533Z</lastModified>
<name>zak.local</name>
<objID>pool2</objID>
<path>host/user/pool2</path>
The bold lines must be differnt! But not i my case, In my case i see this file on other host.
The zak.local resource pool on other hosts has 3 nesting:host & user & pool2 .
But the EIS pool on other hosts has 4 nests and on the problem host should be:
host/user/pool2/pool1.
Pool1 is taken from <objID> pool1 </ objID>.
Final fix:
<resourcePool id="0002">
<lastModified>2018-12-18T14:14:46.494756Z</lastModified>
<name>EIS</name>
<objID>pool1</objID>
<path>host/user/pool2/pool1</path>
Any way see other hosta to get logic of this file.
Then i restart vpxa and hostd and boom!! Host is alive!
Without reboot host!!!
Have you tried removing it from vCenter and re-adding it? The host should remain online while this is happening, just make sure you know the Root password before you do this.
The problem is solved!
look at my hostd.log
2019-01-17T09:33:59.202Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Removing duplicate pools.xml entry 'resourcePool[0003]'
2019-01-17T09:33:59.203Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1'
2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277'
2019-01-17T09:33:59.208Z warning hostd[FFC95B20] [Originator@6876 sub=Hostsvc] Destroying unregistered VMkernel resource group 'host/user/pool2/pool1/vmx.15702277/worldGroup.15702277'
error: Sysinfo error on operation returned status : Operation not permitted. Please see the VMkernel log for detailed error information
see the 1st line - Removing duplicate pools.xml entry 'resourcePool[0003]
Then see pools.xml file. Look at this
<resourcePool id="0002">
<lastModified>2018-12-18T14:14:46.494756Z</lastModified>
<name>EIS</name>
<objID>pool1</objID>
<path>host/user/pool2</path>
</resourcePool>
<resourcePool id="0003">
<lastModified>2018-12-18T14:14:46.508533Z</lastModified>
<name>zak.local</name>
<objID>pool2</objID>
<path>host/user/pool2</path>
The bold lines must be differnt! But not i my case, In my case i see this file on other host.
The zak.local resource pool on other hosts has 3 nesting:host & user & pool2 .
But the EIS pool on other hosts has 4 nests and on the problem host should be:
host/user/pool2/pool1.
Pool1 is taken from <objID> pool1 </ objID>.
Final fix:
<resourcePool id="0002">
<lastModified>2018-12-18T14:14:46.494756Z</lastModified>
<name>EIS</name>
<objID>pool1</objID>
<path>host/user/pool2/pool1</path>
Any way see other hosta to get logic of this file.
Then i restart vpxa and hostd and boom!! Host is alive!
Without reboot host!!!
you have to run this command 2 times to simply fix the issue. 1st run will fail and the 2nd run will do the task
/etc/init.d/hostd restart
/etc/init.d/vpxa restart