Hi!
I've upgraded our VCSA appliance from 5.5U2 to 6 and everything seemed to went fine but after it ran about 12 hours, the appliance began to eat up the two vCPUs almost completely, rendering the web client as well as any SOAP interfaces pretty useless:
Diagnosing the cause is a little bit difficult for me because even the SSH daemon refuses to accept any connections during those high load periods.
Sometimes I've managed it to get a VIMtop output which was basically telling me that the "CIM health service" is taking up more than 100% cpu time. After this, I've noticed that the health stats for my HP ProLiant G5 (still using an 5.5U2/HP image) were missing when using the web client, although they are still accessible using WBEM directly on the host. Then I've simply deactivated the CIM server on that specific host ... but it didn't help.
When look at VCSA's dmesg output, there are dozens of messages like this one:
IPfilter Dropped: IN=eth OUT= MAC:ff:ff:ff:ff:ff:ff:ff:ff (...) SRC=(varies) DST=255.255.255.255 LEN=68 TOS=0x00 PREC=0x00 TTL=128 ID=26090 PROTO=UDP SPT=58980 DPT=1947 LEN=48
Is this somehow related?
So my question is if anyone has an idea how to troubleshoot this issue in order to track down the root cause?
Hi,
I am having the same problem of CPU load getting higher and finally reaching 100%. I dont know what is causing it but i guess its the same problem like you have. I also upgraded from v5.5u2 to v6.0 recently..
Did you fix it already? How can i check if i have the same cause like you?
Best regards,
Roel.
I don't have a solution yet. After upgrading all hosts to ESXi 6, the CIM communication seems to be working again but still, those 100% periods reoccur every few hours.
Sometimes, vCenter reports that the vPostgres service is causing this heavy workload ... which sounds curious to me because my cluster only contains three hosts with about 15 VMs (most of them are powered down).
Same.. it's a mess.
I can't even log into :5480 to enable shell and so I can't get into bash, issue a 'top', and see what is going on....
The CPU cycles are so all consuming, it just is so unresponsive.
I'm running the latest VCSA, 6.0.0-3040890.
There.. just settled down..
I can get a horde of /var/log collateral if anyone has support contract to raise with VMware.. I currently do not.
To work around the issue, you must delete the large amount of vpx_event events from the VCSA database. To delete the large amount of vpx_event events from the VCSA database:
Were any of you able to find a solution for this? I am experiencing the exact same issue at a client today. Please let me know if so, thanks!
Steve