NSX controller v6.3.5 logrotate high cpu

jedijeff · ‎01-07-2019

Hi. We have potentially hit the known bug for NSX controller v6.3.5 and high cpu.

We have Vrealize Network Insight, and have turned off Controller polling.

We also followed this TID.

https://kb.vmware.com/s/article/56811

We dont see really any log files, yet still after the Controller is started within about 10min there are about 15 logrotate process which max out all 4cpus. Eventually the controller disconnects from the cluster.

Anyone know what else to try? I had a very good VMware tech who we spent a good hour going through it all, and pretty mjuch the conclusion was upgrade off of v6.3.5

Odd it is only happening to 1 controller though. I was told we cant really delete just 1 controller, we need to delete all 3 and re-create. I am just curious why.

Thanks,,,

RShankar22 · ‎01-08-2019

You Can try deleting the impacted controller.

mdac · ‎01-08-2019

In the past, VMware always recommended deleting the entire control cluster as a precaution when things went bad - even with a single controller. There were some 'quirks' in older builds of NSX that necessitated this. I don't recall the specifics, but I believe the replacement of the entire cluster prevented cluster election issues. In my personal experience, newer builds - 6.3.5 included - should be fine. That said, it's critical that you confirm your control cluster health before deleting a controller (show control-cluster status). Cluster majority is required for the control plane to function in a read/write capacity (i.e. 2 of 3 controllers need to be up). If the other two are completely healthy, you should be able to delete and re-create a single controller node without any noticeable control plane impact. I'd definitely do it during a maintenance window though to be safe.

Hope this helps.

My blog: https://vswitchzero.com Follow me on Twitter: @vswitchzero

All

NSX controller v6.3.5 logrotate high cpu