VMware Cloud Community
mewert
Contributor
Contributor

helper world consuming 50% CPU

Greetings,

I have a newly built ESX 3.5 Server running on a SunFire 40 Dual AMD Single Core system. I have zero (0) Virtual Machines running on it and noticed that in the VI Client that 50% of the CPU was being consumed. Delving into things further with ESXTOP I discovered that helper is indeed consuming 50% of the CPUs. Does anyone know what is causing this and how to make it stop? Or how to further troubleshoot the problem? With no VMs running it shouldn't be consuming 2 Ghz of CPU horsepower Smiley Happy

Thanks in advance!

M

0 Kudos
39 Replies
mewert
Contributor
Contributor

Ummmm.... OK - I rebooted the ESX Server and helper is no longer consuming 1/2 the CPU. I didn't think I'd have to apply the classic 'Microsoft Fix' to ESX Server... 😉
I will keep an eye on it and post if it acts up again.
0 Kudos
RParker
Immortal
Immortal

Well since it was a new install, it may be the 'helper' service was completing some initial tasks to update the server or configure the host for VC. So maybe this is by design, and you didn't state how long you left it that way, so perhaps you may have been a bit hasty...

0 Kudos
mewert
Contributor
Contributor

Ok - it started happening again, so I left it alone. It's been at 50% for about two hours now. No VMs running - and the ESX Server has been left completely alone. Any ideas? I will examine the logs after more coffee 😄

0 Kudos
RParker
Immortal
Immortal

from the console type : ps -ef and see what you get returned. Look for the process ID for that 'helper' service.

Then try kill and see if it dies, or kill -9 and see if you can kill the service. Then see if it returns. Also the line that helper runs on, there should be a path, what path is it pointing to?

0 Kudos
mewert
Contributor
Contributor

There doesn't seem to be anything by the name 'helper' exposed through ps -ef . Here is the output (below). I've also attached screenshots of esxtop and VirtualCenter showing the CPU utilization. Isn't 'helper' an ESX Server specific service? Or is it part of RedHat?
Thanks for your help!
M
UID PID PPID C STIME TTY TIME CMD root 1 0 0 13:45 ? 00:00:04 init root 2 1 0 13:45 ? 00:00:00 [keventd] root 3 1 0 13:45 ? 00:00:00 --ksoftirqd/0-- root 6 1 0 13:45 ? 00:00:00 [bdflush] root 4 1 0 13:45 ? 00:00:00 [kswapd] root 5 1 0 13:45 ? 00:00:00 [kscand] root 7 1 0 13:45 ? 00:00:00 [kupdated] root 19 1 0 13:45 ? 00:00:00 [vmkmsgd] root 20 1 0 13:45 ? 00:00:00 [vmnixhbd] root 24 1 0 13:45 ? 00:00:00 [vmkdevd] root 25 1 0 13:45 ? 00:00:00 [scsi_eh_0] root 533 1 0 13:45 ? 00:00:00 [scsi_eh_1] root 534 1 0 13:45 ? 00:00:00 [scsi_eh_2] root 562 1 0 13:45 ? 00:00:00 [kjournald] root 613 1 0 13:45 ? 00:00:00 [khubd] root 851 1 0 13:46 ? 00:00:00 [kjournald] root 852 1 0 13:46 ? 00:00:00 [kjournald] root 853 1 0 13:46 ? 00:00:00 [kjournald] root 854 1 0 13:46 ? 00:00:00 [kjournald] root 1366 1 0 13:46 ? 00:00:00 syslogd -m 0 root 1370 1 0 13:46 ? 00:00:00 klogd -x root 1412 1 0 13:46 ? 00:00:00 /bin/sh /opt/vmware/vpxa/bin/vmware-watchdog -s vpxa -u 30 -q 5 /opt/vmware/vpxa/sbin/vpxa root 1421 1412 0 13:46 ? 00:00:15 /opt/vmware/vpxa/vpx/vpxa root 1434 1 0 13:46 ? 00:00:00 /usr/sbin/sshd root 1488 1 0 13:46 ? 00:00:00 /usr/sbin/vmklogger root 1549 1 0 13:46 ? 00:00:00 xinetd -stayalive -pidfile /var/run/xinetd.pid ntp 1564 1 0 13:46 ? 00:00:00 ntpd -U ntp -p /var/run/ntpd.pid root 1589 1 0 13:46 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s webAccess -u 30 -q 5 /usr/lib/vmware/webAccess/java/jre1.5.0_07/bin/webAccess -server -Xincgc root 1596 1589 0 13:46 ? 00:00:12 /usr/lib/vmware/webAccess/java/jre1.5.0_07/bin/webAccess -server -Xincgc -Djava.util.logging.manager=org.apache.juli.ClassLoaderL root 1617 1 0 13:46 ? 00:00:00 crond root 1628 1 0 13:46 ? 00:00:00 /usr/lib/vmware/bin/vmkload_app --setsid --sched.group=host/vim/vmkauthd --sched.mem.min=4 --sched.mem.max=12 /usr/lib/vmware/bin root 1648 1 0 13:46 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s hostd -u 60 -q 5 -c /usr/sbin/vmware-hostd-support /usr/sbin/vmware-hostd -u -a root 1649 1 0 13:46 ? 00:00:00 logger -t VMware[init] -p daemon.err root 1692 1648 0 13:46 ? 00:00:27 /usr/lib/vmware/hostd/vmware-hostd /etc/vmware/hostd/config.xml -u -a root 1714 1 0 13:46 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s cimserver -u 60 -q 5 /var/pegasus/bin/cimserver daemon=false root 1725 1714 0 13:46 ? 00:00:02 /var/pegasus/bin/cimserver daemon=false root 1726 1 0 13:46 ? 00:00:00 /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.conf root 1746 1 0 13:46 ? 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s wsmand -u 60 -q 5 /sbin/wsmand -d root 1754 1746 0 13:46 ? 00:00:00 /sbin/wsmand -d root 1757 1 0 13:46 tty1 00:00:00 /sbin/mingetty tty1 root 1758 1 0 13:46 tty2 00:00:00 /sbin/mingetty tty2 root 1759 1 0 13:46 tty3 00:00:00 /sbin/mingetty tty3 root 1760 1 0 13:46 tty4 00:00:00 /sbin/mingetty tty4 root 1761 1 0 13:46 tty5 00:00:00 /sbin/mingetty tty5 root 1762 1 0 13:46 tty6 00:00:00 /sbin/mingetty -f /etc/issue.emergency -l /bin/login.emergency tty6 root 1838 1725 0 13:46 ? 00:00:00 /var/pegasus/bin/cimservera root 1902 1434 0 13:51 ? 00:00:00 sshd: root@pts/0 root 1904 1902 0 13:51 pts/0 00:00:00 -bash root 2296 1904 0 19:01 pts/0 00:00:00 ps -ef
0 Kudos
mewert
Contributor
Contributor

Digging a little further, I refreshed my recollection of 'helper' . 'Helper' is one of the esxtop "worlds" that represents an 'asynchronous task' (per the guide to monitoring ESX 2 with esxtop:[http://www.vmware.com/pdf/esx2_using_esxtop.pdf] - I know this is 3.5 but I presume helper is still used the same way).

Now, what's interesting is examining the CPU load through normal old 'top' it shows the system is basically idle.

So, riddle me this Batman: does 'top' show no CPU load because that's only showing the performance of the service console and not the VMKernel? Or is the load shown in esxtop and VirtualCenter "phantom load" ? In otherwords, is this system really running at 50% of two CPU's?

Fun fun Smiley Happy

I've attached the 'top' output screenshot.

0 Kudos
mcowger
Immortal
Immortal

Top shows only the service console.

esxtop includes the vmkernel.

--Matt

--Matt VCDX #52 blog.cowger.us
0 Kudos
RParker
Immortal
Immortal

So does this mean you did an upgrade from 2.5 to 3.5?

0 Kudos
mewert
Contributor
Contributor

No - completely fresh installation of 3.5 (complete format of the hard drive). I was simply citing the esxtop guide reference to 'helper worlds' which was written in the 2.5 era. Since esxtop still shows 'helper' it appears that hasn't changed from 2.5 (but who can say).

So, as the other poster concurred, top only shows the service console - so something within VMKernel is sapping 1/2 my CPUs. I perused all the logs (VMKernel, VMKSummary, VMKwarning, messages) and there is nothing of interest - the logs are actually much "cleaner" than my 3.02 servers Smiley Happy

0 Kudos
mewert
Contributor
Contributor

More info.

Expanding the 'helper world' in esxtop it shows that helper0-1 is consuming the CPU; and that there are 22 "helper worlds" in all. Only helper0-1 is affecting the CPU. Can anyone out there correlate helper0-1 to an actual ESX process? Attached is a screenshot showing the expanded helper worlds in esxtop.

M

p.s. I transferred a couple VMs to the system and when I fired them up, the system did reflect their performance on top of the 50% being consumed by helper0-1. helper0-1 didn't 'back off' or do anything nice like that Smiley Wink

0 Kudos
mewert
Contributor
Contributor

Update: the same thing happens with ESX 3i . I installed it today and the CPU immediately went to 50% utilization. Grrrr.....
Anyone out there have any ideas?
Thanks,
M
0 Kudos
mewert
Contributor
Contributor

Another update: I discovered the 'Advanced System Resource Settings' option under configuration for the ESX Server in VirtualCenter and changed the CPU resource settings for 'helper' to a limit of 500mhz. It immediately brought the system utilization down to 500mhz. SO - another confirmation that it is the helper-world0 consuming the CPU. I have absolutely no idea what helper world 0 does - but I will let you know what happens now that I've locked it at 500mhz 🙂
If anyone out there is reading this and knows what helper world 0 does - or has run into this problem as well - it would be great to hear from you 🙂
Hacking on...
M
0 Kudos
VBDINO
Contributor
Contributor

I have the same problem after a fresh install of esx 3.5 dated feb 20. ESXTOP shows that helper0 is consuming around 98% of cpu time.

My server is an IBM X235 which is not supported by VMWare. Is your server supported?

0 Kudos
bernd2nd
Contributor
Contributor

i have the same problem with four servers. I will now change the PCI cards to other slots.

0 Kudos
VBDINO
Contributor
Contributor

Even with the latest fixes, it still happens. As a work around, I have limited the cpu limit for the helper process. Because the x235 is not supported, I didn't open an SR.

Is your servers in the list of supported servers?

0 Kudos
bernd2nd
Contributor
Contributor

Yes, the servers (Primergy RX600S4) are on the hcl list. I will update you after reordering the PCI cards.

0 Kudos
bernd2nd
Contributor
Contributor

Workaround for the RX600S4:

Do not use PCI-Slot 6 for ESX 3.5 at the moment.

0 Kudos
jmlemmer
Contributor
Contributor

Hi,

I have exactly the same issue on 2 brand new SUN Blade X6250 with ESX 3.5 (wich is a supported configuration by both SUN and VMware).

I opened a support request with VMware and they told me that this issue has been seen in the past and should be fixed by an update of the BIOS of the server.

They also confirmed that it could be related to one of the PCI slots and that I should try different slots, which however proves to be difficult since the blades only have 2 slots and they are both used.

I then opened a service request with SUN, but they seem to be pointing to VMware to find a fix for the problem.

This leaves me stuck in the middle Smiley Sad

So any ideas on how to solve the problem are welcome.

Thanks,

JM.

0 Kudos
VBDINO
Contributor
Contributor

On my x235, I have an ethernet card on slot #5 that is not used. I will try to remove it and also check for bios update.

For jmlemmer, even though VMware responded by saying that you need to apply a bios and left it at that. You should insist on further investigation by another engineer. Some of them are not very curious and tend to easily close their incidents.

As an exemple, we had a problem with ESX servers not failing over to controller B, when we did a reset on the SAN controller A. Setting were set correctly. The first incident resulted in saying that it was an IBM problem, not VMWare. Then we spoke to IBM and even if they tried to say they have no problems, we insisted and did all their dummy tests. Meanwhile I opened up a new case at VMware and the new engineer showed more interest to solve the problem. He came up with instructions on starting ESX with more debug information for the HBA and confirmed again it was a problem with IBM. He also proposed to speak with IBM which probably help IBM to agree that their DS4500 SAN has a problem.

So if you do have a support contract with the server and their BIOS doesn't fix it. Reopened a new case at VMWare and request them to speak with your contact at SUN. If the problem is in the hardware, VMWare has to help you diagnose exactly what is not working and only them have access to information on how to increase diagnostic level in ESX.

0 Kudos