I have just upgraded my lab from 3.2.1 to 4.0. It did not go as smooth as I expected but I got through.
Now I have strange error (at least it is to me) where an alarm constant comes back with "Application crashed"
When tailing /var/log/syslog I have these entries, but I am not sure this is related or a different error:
2022-08-15T11:53:57.155Z nsxt.lab.local NSX 32752 - [nsx@6876 comp="nsx-manager" s2comp="nsx-net" tid="32761" level="WARNING"] StreamConnection[8712 Connecting to unix:///var/run/vmware/nestdb/nestdb-server.sock sid:8712] Couldn't connect to 'unix:///var/run/vmware/nestdb/nestdb-server.sock' (error: 2-No such file or directory)
2022-08-15T11:53:57.156Z nsxt.lab.local NSX 32752 - [nsx@6876 comp="nsx-manager" s2comp="nsx-net" tid="32761" level="WARNING"] StreamConnection[8712 Error to unix:///var/run/vmware/nestdb/nestdb-server.sock sid:-1] Error 2-No such file or directory
2022-08-15T11:53:57.156Z nsxt.lab.local NSX 32752 - [nsx@6876 comp="nsx-manager" s2comp="nsx-rpc" tid="32761" level="WARNING"] RpcConnection[8712 Connecting to unix:///var/run/vmware/nestdb/nestdb-server.sock 0] Couldn't connect to unix:///var/run/vmware/nestdb/nestdb-server.sock (error: 2-No such file or directory)
2022-08-15T11:53:57.156Z nsxt.lab.local NSX 32752 - [nsx@6876 comp="nsx-manager" s2comp="nsx-rpc" tid="32761" level="WARNING"] RpcTransport[0] Unable to connect to unix:///var/run/vmware/nestdb/nestdb-server.sock: 2-No such file or directory
2022-08-15T11:53:57.156Z nsxt.lab.local NSX 32752 - [nsx@6876 comp="nsx-manager" s2comp="nestdb-client" tid="32752" level="WARNING"] NestDbClient: failed to get stub to unix:///var/run/vmware/nestdb/nestdb-server.sock, retrying in 5000 ms...
Looks like the service nestdb is missing?
I do know this is a lab environment, but I am bit uncertain about going to upgrade customer installations as long as I do not know what this error is about, and would like to be certain before doing any ![]()
Any idea on what this could be is highly appreciated!
Hello, I have currently the same error in my Lab-Stage as well! May I ask you, if the GENEVE Tunnels can be built if you have running VM on NSX-T Segment? I am not sure whether this error has any effect on building GENEVE Tunnels.
I don't see any problems in the tunnels, all are up. And the VMs can communicate.
/Martin
I have the same error as well...
I also have the same error here after upgrade from 3.2 to 4.0
I also have the same error with the same NSX version.
in my case, I did observe that the ESXi hosts /var/core/ had and sfcb service core dump. All transport node had the same core dump with the same date/hour.
As it is a production environment, I opened a VMware support case to read the core dump service
Please let us know the outcome of the SR!
I tried to power down the NSX manager and restore from a backup. But that did not work either. It just came back with an "internal server error" when attempting to restore the database. I would have thought the backup process did a integrity check of the backup before giving a "completed successfully"... but no..
I have instead just deleted the NSX manager instance that gave the error and deployed aa new instance, and the error is gone.
/Martin
Hi leotaglietti,
Are you able to share the result of your interaction with support (or pm the ID of the service request)? I'm having the same / a very similar issue in my environment. I'm seeing the same errors appearing in my logs, and I'm also seeing various components of the UI not displaying correctly (for instance, the Policy UI doesn't show dfw statistics, whereas the manager UI does).
I've got a support call open at the moment as well, however the support agent is requesting ~10gb of logs, which I can't provide due to the location of the environment.
Hello Team.
I have this problem solved with VMware GSS recently. I raised a support request to NSX Team and the NSX Support Engineer told me that NSX was ok and he didn't find any problem. With NSX Support Engineer we identified that the "Application crashed" error which was showing up on NSX Manager GUI was related to one core dump on /var/core/ of each transport node (ESXi host). Once this was a service core dump this wasn't a human-readable file so the VMware GSS ESXi read the service core dump and they saw that someone HPe module called smx was causing the core dump.
We uninstalled this smx module (which we weren't using) and the problem stopped being present.
PS: to stop application crash message you must to move or delete the core dump file on /var/core/
I could observe an interesting thing about this core dump: The core dump was generated only when the ESXi host was prepared for being NSX Transport Node or when we choose to remove NSX from the host. After removing the smx module the core dump stops to be showing up during ESXi preparation host and during the process to remove too ![]()
Maybe the problem with you environment could be another module so I suggest you to check /var/core/ partition on ESXi host and see if this is generating during ESXi prepare process to be a NSX Transport node or during the NSX removing process.
I am using Cisco UCS servers and they do not have that module installed. So I am not sure that this was causing the issue on my lab.
I have not seen the problem since I redeployed the NSX node.
for me it was due to stale coredumps which been there for some time already
root@nsx-mgr3:~# ls -al /var/dump
total 1440
drwxrwx--- 3 root nsx 4096 Sep 23 06:12 .
drwxr-xr-x 15 root root 4096 Oct 9 05:22 ..
-rw-rw-rw- 1 root root 688915 Aug 7 2021 core.snmpd.1628373629.1091.0.11.gz
-rw-r--r-- 1 root root 757597 Sep 23 06:12 core.snmpd.1663913558.7701.0.11.gz
drwx------ 2 root root 16384 Feb 13 2020 lost+found
it seems that NSX 4.0 reports this as an alarm which was not previously the case
deleted these gz files and resolved the alarm
run following command from the NSX Manager having this alarm.
del core-dump all
