Linux hangs randomly

damocue · ‎02-15-2013

Hello,

At first, I had a Linux VM (CentOS) connected via NFS to a NAS. Randomly, Linux guest hanged on with the following message on kernel log:

Feb 12 20:44:31 srv-doohan kernel: [ 2940] 0 2940 24532 313 3 0 0 sshd

Feb 12 20:44:31 srv-doohan kernel: [ 3625] 0 3625 24532 294 2 0 0 sshd

Feb 12 20:44:31 srv-doohan kernel: [ 4301] 0 4301 24547 335 3 0 0 sshd

Feb 12 20:44:31 srv-doohan kernel: [ 4662] 0 4662 24530 304 1 0 0 sshd

Feb 12 20:44:31 srv-doohan kernel: [ 4978] 0 4978 24530 302 1 0 0 sshd

As it happened several times, I created another Linux VM with another distribution (Ubuntu), connected to same NAS, and I got the same message:

Feb 15 08:59:59 srv-pedrosa kernel: [234285.030860] [16883] 0 16883 18492 25 2 0 0 sshd

Feb 15 08:59:59 srv-pedrosa kernel: [234285.030868] [17120] 0 17120 18460 27 2 0 0 sshd

Feb 15 08:59:59 srv-pedrosa kernel: [234285.030876] [17355] 0 17355 18417 26 0 0 0 sshd

Feb 15 08:59:59 srv-pedrosa kernel: [234285.030884] [17595] 0 17595 18443 27 2 0 0 sshd

Feb 15 08:59:59 srv-pedrosa kernel: [234285.030892] [17834] 0 17834 18450 24 2 0 0 sshd

Each VM has a different kernel version: CentOS -> 2.6.32 | Ubuntu -> 3.2.0

Googling, I found it could due to vmmemctl. Has anyone had this before?

My ESXi hosts are at version 4.1 Update1, and this behaviour happens on all of them. Maybe it could due to be on an old ESXi version?

Thanks in advance,

a_nut_in · ‎02-15-2013

Hey Damocue,

Would help if the vmware.log for the affected VM's as well as the /var/log/messages from the last hang were shared across.

Regards

a

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!

damocue · ‎02-18-2013

Hi,

This morning it has just happened at CentOS VM. I attach you vmware.log, and dmesg and messages from /var/log.

Regards,

a_nut_in · ‎02-18-2013

vmware.log

Feb 14 20:22:06.800: vmx| GuestRpcSendTimedOut: message to toolbox timed out.
Feb 14 20:22:06.800: vmx| Vix: [17257331 guestCommands.c:2468]: Error VIX_E_TOOLS_NOT_RUNNING in VMAutomationTranslateGuestRpcError(): VMware Tools are not running in the guest
Feb 14 20:35:42.185: vcpu-3| VMMouse: CMD Disable
Feb 14 20:35:42.185: vcpu-3| VMMouse: Disabling VMMouse mode
Feb 14 20:36:02.658: mks| MKS switching absolute mouse off
Feb 14 21:21:08.665: mks| MKS: Base polling period is 10000us
Feb 14 21:21:08.974: mks| VNCENCODE 4 encoding mode change: (720x400x24depth,32bpp,2880bytes/line)
Feb 14 21:21:14.105: mks| SVGA: display status changed, using optimizations for remote consoles.
Feb 14 21:21:49.728: vmx| Vix: [17257331 vmxCommands.c:392]: VMAutomation_Reset
Feb 14 21:21:49.728: vmx| Vix: [17257331 vmxCommands.c:457]: VMAutomation_Reset. Trying hard reset
Feb 14 21:21:49.728: vmx|
Feb 14 21:21:49.728: vmx|
Feb 14 21:21:49.728: vmx| VMXRequestReset
Feb 14 21:21:49.728: vmx| Stopping VCPU threads...

Is the VMware HA - VM monitoring for the Virtual Machine enabled? If so, can that be turned off and checked?

http://kb.vmware.com/kb/1027734

The interesting bit is here

eb 18 10:33:39 srv-doohan kernel: [17924]     0 17924    37172     3003   0       0             0 perl
Feb 18 10:33:39 srv-doohan kernel: [17986]     0 17986    26523      346   3       0             0 bash
Feb 18 10:33:39 srv-doohan kernel: [18000]     0 18000    26523      153   0       0             0 bash
Feb 18 10:33:39 srv-doohan kernel: [18001]     0 18001    37238     3080   2       0             0 perl
Feb 18 10:33:39 srv-doohan kernel: Out of memory: Kill process 1480 (vmtoolsd) score 1 or sacrifice child
Feb 18 10:33:39 srv-doohan kernel: Killed process 1480, UID 0, (vmtoolsd) total-vm:50092kB, anon-rss:284kB, file-rss:828kB
Feb 18 10:34:08 srv-doohan snmpd[1959]: Connection from UDP: [192.168.100.77]:59867->[192.168.100.106]
Feb 18 10:34:08 srv-doohan snmpd[1959]: Connection from UDP: [192.168.100.77]:59867->[192.168.100.106]

But what is more interesting is here

Feb 18 09:43:30 srv-doohan kernel: [15724] 0 15724 37238 3081 3 0 0 perl
Feb 18 09:43:30 srv-doohan kernel: [15729] 0 15729 25275 194 1 0 0 df
Feb 18 09:43:30 srv-doohan kernel: Out of memory: Kill process 2532 (Xvnc) score 3 or sacrifice child
Feb 18 09:43:30 srv-doohan kernel: Killed process 2532, UID 500, (Xvnc) total-vm:106780kB, anon-rss:8104kB, file-rss:1948kB
Feb 18 09:43:31 srv-doohan gnome-keyring-daemon[3092]: dbus failure unregistering from session: Connection is closed
Feb 18 09:43:31 srv-doohan gnome-keyring-daemon[3092]: dbus failure unregistering from session: Connection is closed
Feb 18 09:44:09 srv-doohan snmpd[1959]: Connection from UDP: [192.168.100.77]:33162->[192.168.100.106]
Feb 18 09:44:09 srv-doohan snmpd[1959]: Connection from UDP: [192.168.100.77]:33162->[192.168.100.106]

and

Feb 18 09:43:29 srv-doohan kernel: beremote invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Feb 18 09:43:29 srv-doohan kernel: beremote cpuset=/ mems_allowed=0
Feb 18 09:43:29 srv-doohan kernel: Pid: 356, comm: beremote Not tainted 2.6.32-279.22.1.el6.x86_64 #1
Feb 18 09:43:29 srv-doohan kernel: Call Trace:
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff810c2c31>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81112f80>] ? dump_header+0x90/0x1b0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8120e2cc>] ? security_real_capable_noaudit+0x3c/0x70
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81113402>] ? oom_kill_process+0x82/0x2a0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81113341>] ? select_bad_process+0xe1/0x120
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81113840>] ? out_of_memory+0x220/0x3c0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8112355e>] ? __alloc_pages_nodemask+0x89e/0x940
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8115772a>] ? alloc_pages_current+0xaa/0x110
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81110407>] ? __page_cache_alloc+0x87/0x90
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81125f3b>] ? __do_page_cache_readahead+0xdb/0x210
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81126091>] ? ra_submit+0x21/0x30
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81126405>] ? ondemand_readahead+0x115/0x240
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81126623>] ? page_cache_sync_readahead+0x33/0x50
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81111d88>] ? generic_file_aio_read+0x558/0x700
Feb 18 09:43:29 srv-doohan kernel: [<ffffffffa02a209a>] ? nfs_file_read+0xca/0x130 [nfs]
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8117660a>] ? do_sync_read+0xfa/0x140
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8117b6e4>] ? cp_new_stat+0xe4/0x100
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81219e2b>] ? selinux_file_permission+0xfb/0x150
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8120cd06>] ? security_file_permission+0x16/0x20
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81176ef5>] ? vfs_read+0xb5/0x1a0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81177031>] ? sys_read+0x51/0x90
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff810d3c95>] ? __audit_syscall_exit+0x265/0x290
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Feb 18 09:43:29 srv-doohan kernel: Mem-Info:
Feb 18 09:43:29 srv-doohan kernel: Node 0 DMA per-cpu:
Feb 18 09:43:29 srv-doohan kernel: CPU 0: hi: 0, btch: 1 usd: 0
Feb 18 09:43:29 srv-doohan kernel: CPU 1: hi: 0, btch: 1 usd: 0

Now there are various references to this and all point to the VM basically running out of memory

http://askubuntu.com/questions/161521/why-does-my-server-freeze-everyday-at-the-same-time

http://ubuntuforums.org/showthread.php?s=14cf93e5feeed092bdfd0fdeebff02c3&t=1421823&page=2

Question is, what applications are you running within the guests?

Are the hosts - and hence the VM's overcommitted for memory/CPU?

Can more Memory be assigned to these guests and check?

But more than that, what it looks like is that there is a memory leak within some application you are running. Would help if this were posted on the Ubuntu/Cent OS forums as well

Regards

a

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!

All

Linux hangs randomly