ESXi hosts periodically trying to reconnect old NF...

Le_Tadlo · ‎03-18-2020

Hello all,

I noticed three of my ESXi hosts are trying to connect to old NFS shares that were removed.

They are not defined via GUI or present in esxcli storage nfs list or esxi storage core device detached list

However, I found them defined in /etc/vmware/esxi.conf

I tried manually editing them out and reloading vpxa, hostd and nfsgssd but the machine with NFS shares still shows attempts to connect. I suppose rebooting the hosts will help, but I don't want to do that for obvious reasons.

Lalegre · ‎03-18-2020

I think that the thing you were doing was the correct procedure. Editing the esx.conf should work with this behaviour.

Try to do a services.sh restart it has no downtime it will only restart all the services without restarting the ESXi. No Virtual Machine will be affected.

Nawals · ‎03-18-2020

Have you rescan the storage if not please do and try. even after persist same issue please reboot the ESXi hosts to get fix it.

NKS Please Mark Helpful/correct if my answer resolve your query.

Le_Tadlo · ‎03-18-2020

[root@ESXi-IBM-1:~] services.sh restart &tail -f /var/log/jumpstart-stdout.log

2019-12-02T11:46:44.111Z| executing start plugin: lacp

2019-12-02T11:46:44.314Z| executing start plugin: memscrubd

2019-12-02T11:46:44.517Z| executing start plugin: smartd

2019-12-02T11:46:44.721Z| executing start plugin: vpxa

2019-12-02T11:46:44.922Z| executing start plugin: lwsmd

2019-12-02T11:46:48.146Z| executing start plugin: sfcbd-watchdog

2019-12-02T11:46:48.550Z| executing start plugin: wsman

2019-12-02T11:46:48.754Z| executing start plugin: snmpd

2019-12-02T11:46:49.966Z| executing start plugin: xorg

2019-12-02T11:46:50.369Z| executing start plugin: vmtoolsd

2020-03-18T08:16:17.493Z| executing stop for daemon xorg.

2020-03-18T08:16:17.695Z| executing stop for daemon vmsyslogd.

2020-03-18T08:16:17.897Z| Jumpstart failed to stop: vmsyslogd reason: Execution of command: /etc/init.d/vmsyslogd stop failed with status: 1

2020-03-18T08:16:17.897Z| executing stop for daemon vmtoolsd.

2020-03-18T08:16:18.299Z| Jumpstart failed to stop: vmtoolsd reason: Execution of command: /etc/init.d/vmtoolsd stop failed with status: 1

2020-03-18T08:16:18.299Z| executing stop for daemon wsman.

2020-03-18T08:16:18.502Z| executing stop for daemon snmpd.

2020-03-18T08:16:19.505Z| executing stop for daemon sfcbd-watchdog.

2020-03-18T08:16:21.111Z| executing stop for daemon lwsmd.

2020-03-18T08:16:23.921Z| executing stop for daemon vpxa.

2020-03-18T08:16:24.322Z| executing stop for daemon vobd.

2020-03-18T08:16:24.723Z| executing stop for daemon dcbd.

2020-03-18T08:16:24.924Z| executing stop for daemon cdp.

2020-03-18T08:16:25.327Z| executing stop for daemon nscd.

2020-03-18T08:16:25.730Z| executing stop for daemon lacp.

2020-03-18T08:16:26.132Z| executing stop for daemon memscrubd.

2020-03-18T08:16:26.334Z| Jumpstart failed to stop: memscrubd reason: Execution of command: /etc/init.d/memscrubd stop failed with status: 3

2020-03-18T08:16:26.334Z| executing stop for daemon smartd.

2020-03-18T08:16:26.735Z| executing stop for daemon slpd.

2020-03-18T08:16:26.936Z| executing stop for daemon sdrsInjector.

2020-03-18T08:16:27.339Z| executing stop for daemon storageRM.

2020-03-18T08:16:27.740Z| executing stop for daemon vvold.

2020-03-18T08:16:27.942Z| Jumpstart failed to stop: vvold reason: Execution of command: /etc/init.d/vvold stop failed with status: 3

2020-03-18T08:16:27.942Z| executing stop for daemon hostdCgiServer.

2020-03-18T08:16:28.345Z| executing stop for daemon sensord.

2020-03-18T08:16:29.146Z| executing stop for daemon lbtd.

2020-03-18T08:16:29.548Z| executing stop for daemon hostd.

2020-03-18T08:16:29.950Z| executing stop for daemon rhttpproxy.

2020-03-18T08:16:30.351Z| executing stop for daemon nfcd.

2020-03-18T08:16:30.552Z| executing stop for daemon vmfstraced.

2020-03-18T08:16:30.954Z| executing stop for daemon rabbitmqproxy.

2020-03-18T08:16:31.155Z| executing stop for daemon esxui.

2020-03-18T08:16:31.358Z| executing stop for daemon usbarbitrator.

2020-03-18T08:16:31.759Z| executing stop for daemon iofilterd-spm.

2020-03-18T08:16:31.960Z| executing stop for daemon swapobjd.

2020-03-18T08:16:32.565Z| executing stop for daemon iofilterd-vmwarevmcrypt.

2020-03-18T08:16:32.766Z| executing stop for daemon SSH.

2020-03-18T08:16:32.967Z| executing stop for daemon DCUI.

2020-03-18T08:16:33.169Z| executing stop for daemon ntpd.

Errors:

Invalid operation requested: This ruleset is required and connot be disabled

2020-03-18T08:16:35.686Z| executing start plugin: SSH

2020-03-18T08:16:35.887Z| executing start plugin: DCUI

2020-03-18T08:16:36.088Z| executing start plugin: ntpd

2020-03-18T08:16:36.483Z| executing start plugin: esxui

2020-03-18T08:16:37.688Z| executing start plugin: usbarbitrator

2020-03-18T08:16:39.492Z| executing start plugin: iofilterd-spm

2020-03-18T08:16:39.895Z| executing start plugin: swapobjd

2020-03-18T08:16:40.297Z| executing start plugin: iofilterd-vmwarevmcrypt

2020-03-18T08:16:40.700Z| executing start plugin: sdrsInjector

2020-03-18T08:16:40.901Z| executing start plugin: storageRM

2020-03-18T08:16:41.102Z| executing start plugin: vvold

2020-03-18T08:16:43.110Z| executing start plugin: hostdCgiServer

2020-03-18T08:16:43.311Z| executing start plugin: sensord

2020-03-18T08:16:43.712Z| executing start plugin: lbtd

2020-03-18T08:16:43.914Z| executing start plugin: hostd

2020-03-18T08:16:44.918Z| executing start plugin: rhttpproxy

2020-03-18T08:16:45.321Z| executing start plugin: nfcd

2020-03-18T08:16:45.522Z| executing start plugin: vmfstraced

2020-03-18T08:16:45.724Z| executing start plugin: rabbitmqproxy

2020-03-18T08:16:46.728Z| executing start plugin: slpd

2020-03-18T08:16:46.929Z| executing start plugin: dcbd

2020-03-18T08:16:47.130Z| executing start plugin: cdp

2020-03-18T08:16:47.332Z| executing start plugin: nscd

2020-03-18T08:16:47.533Z| executing start plugin: lacp

2020-03-18T08:16:47.736Z| executing start plugin: memscrubd

2020-03-18T08:16:47.938Z| executing start plugin: smartd

2020-03-18T08:16:48.140Z| executing start plugin: vpxa

2020-03-18T08:16:48.342Z| executing start plugin: lwsmd

2020-03-18T08:16:51.552Z| executing start plugin: sfcbd-watchdog

2020-03-18T08:16:51.955Z| executing start plugin: wsman

2020-03-18T08:16:52.356Z| executing start plugin: snmpd

2020-03-18T08:16:53.360Z| executing start plugin: xorg

2020-03-18T08:16:53.763Z| executing start plugin: vmtoolsd

[1]+ Done services.sh restart

Unfortunatelly, it doesn't seem to work as well. Still seeing attempts to connect.

Lalegre · ‎03-18-2020

Please try the next command:

/usr/sbin/auto-backup.sh

It will apply the changes on the ESXi without restarting it.

Nawals · ‎03-18-2020

Maybe this can be resolved (worked around) by - at least temporarily - creating the mountpoint/share on the NAS, and then - once the ESXi host mounted the share - unmount it from the ESXi host.

NKS Please Mark Helpful/correct if my answer resolve your query.

Le_Tadlo · ‎03-18-2020

OK, I created the temporary mountpoints on NFS server and the hosts did connect ...

The hosts are managed by vCenter, can you correctly onmount them to avoid any further issues?

Nawals · ‎03-18-2020

So host are connected right?

Check NFS Datastore: esxcli storage nfs list

Remove NFS Datastore: esxcli storage nfs remove -v NFS_Datastore_Name or esxcfg-nas -d NFS_Datastore_Name

NKS Please Mark Helpful/correct if my answer resolve your query.

Le_Tadlo · ‎03-18-2020

Yeah, about that

[root@ESXi-IBM-1:~] esxcli storage nfs list

Volume Name Host Share Accessible Mounted Read-Only isPE Hardware Acceleration

----------- ---------- -------------------------- ---------- ------- --------- ----- ---------------------

Rko 10.33.11.5 /mnt/Rko true true false false Not Supported

BackupStor 10.33.11.5 /mnt/datastore2/nfs/vmware true true false false Not Supported

[root@ESXi-IBM-1:~] esxcli storage nfs remove -v Rko

Cannot unmount Rko: not mounted with NFS

[root@ESXi-IBM-1:~]

Maybe the issue now is that I removed them from esx.conf?

Nawals · ‎03-18-2020

Great. now issue is still showing /mnt/Rko this old one which is not have in the environment right? If yes can you please confirm as this removed from NFS server as well or kept it on it?

NKS Please Mark Helpful/correct if my answer resolve your query.

Le_Tadlo · ‎03-18-2020

Yes, I need to remove old shares

/mnt/Rko

/mnt/data1/nfs/vmware

which no longer exist (I temporarily created shares with full read/write permissions)

Nawals · ‎03-18-2020

Can you please share the old share and new share output please?

NKS Please Mark Helpful/correct if my answer resolve your query.

Lalegre · ‎03-18-2020

Hey

Try the following KB: VMware Knowledge Base

Try both commands and also stop the service is mentioned there.

Nawals · ‎03-18-2020

Display the list of the NFS storages in the system:

esxcli storage nfs list

esxcli storage nfs remove –v datastore_nfs02

If the NFS datastore isn’t removed from the vSphere Client, click the Refresh button in the ESXi storage section (Configuration -> Storage).

Note. This has to be done on every ESXi host, where you need to remove the inactive storage.

Run this command to stop the SIOC service:

/etc/init.d/storageRM stop

In the vSphere Client, select the host and then click the Configuration tab.

Click Rescan All.

After the rescan completes, run this command to restart the SIOC service:

/etc/init.d/storageRM start

Run this command to unmount the NFS datastore:

#esxcli storage nfs remove –v datastore_nfs02

Please mark hellpful or correct if resolve the issue.

NKS Please Mark Helpful/correct if my answer resolve your query.

Le_Tadlo · ‎03-18-2020

[root@ESXi-R510-1:~] esxcli storage nfs list

Volume Name Host Share Accessible Mounted Read-Only isPE Hardware Acceleration

------------ ---------- -------------------------- ---------- ------- --------- ----- ---------------------

IBM_StorSrv1 10.33.11.5 /mnt/data1/nfs/vmware true true false false Not Supported

BackupStor 10.33.11.5 /mnt/datastore2/nfs/vmware true true false false Not Supported

[root@ESXi-R510-1:~]

[root@ESXi-R510-1:~] /etc/init.d/storageRM stop

watchdog-storageRM: Terminating watchdog process with PID 1873707

storageRM stopped

RESCANNED in vCenter

[root@ESXi-R510-1:~] /etc/init.d/storageRM start

storageRM started

[root@ESXi-R510-1:~] esxcli storage nfs list

Volume Name Host Share Accessible Mounted Read-Only isPE Hardware Acceleration

------------ ---------- -------------------------- ---------- ------- --------- ----- ---------------------

IBM_StorSrv1 10.33.11.5 /mnt/data1/nfs/vmware true true false false Not Supported

BackupStor 10.33.11.5 /mnt/datastore2/nfs/vmware true true false false Not Supported

[root@ESXi-R510-1:~] esxcli storage nfs remove -v IBM_StorSrv1

Cannot unmount IBM_StorSrv1: not mounted with NFS

[root@ESXi-R510-1:~]

I rebooted one of the hypervisors and the share is gone ... unfortunatelly, I cannot do the same for all 3

Nawals · ‎03-18-2020

Great. your issue is resolved. you can do the same on other hosts as well.

Please mark correct or helpful as your issue is resolved.

NKS Please Mark Helpful/correct if my answer resolve your query.

Lalegre · ‎03-18-2020

Have you tried the command i mentioned before?

/usr/sbin/auto-backup.sh

Run it after deleting the lines of the nfs from the esx.conf.

Nawals · ‎03-18-2020

Configuration information for the host is available in several configuration files within the /etc/ directory on an ESXi host. If changes are made to these configuration files, they do not persist across reboots.

Configuration changes to files on an ESXi Installable or Embedded host installation are retained in the state.tgz file on the boot device. The state.tgz file contains a copy of the configuration file /etc/vmware/esx.conf, which is consulted early in the startup process, prior to loading drivers.

The state.tgz file is regenerated automatically under two conditions:

After an hour, a periodic cron job /var/spool/cron/crontabs/root runs the command /sbin/auto-backup.sh, which updates the state.tgz file if any configuration has changed.

During a graceful shutdown or restart, the script /sbin/shutdown.sh runs the command /sbin/backup.sh 1, which updates the state.tgz file.

If the state.tgz file is not updated automatically following a configuration change, and the host is shutdown or restarted uncleanly, the state.tgz file contains the previous configuration. Reapply the configuration changes to the host and backup the configuration to state.tgz automatically or manually.

To manually regenerate the state.tgz file after a configuration change:

Open a console to the ESXi host.

Run this command: /sbin/auto-backup.sh

Please mark correct or helpful.

NKS Please Mark Helpful/correct if my answer resolve your query.

Le_Tadlo · ‎03-18-2020

Have you tried the command i mentioned before?
/usr/sbin/auto-backup.sh
Run it after deleting the lines of the nfs from the esx.conf.

Yes, I tried the command.

I'm considering adding the lines back to esx.conf and trying to remove the shares then? I'm running out of ideas. Reboot seems to be working, but 2 hosts are production machines which I cannot reboot outside service window.

Lalegre · ‎03-18-2020

Adding the lines could be one workaround and correctly unmount them.

And let me ask, why are you not able to reboot the hosts. Is not vMotion the VMs a possibility?

All

ESXi hosts periodically trying to reconnect old NFS shares