Hello,
At first, I had a Linux VM (CentOS) connected via NFS to a NAS. Randomly, Linux guest hanged on with the following message on kernel log:
As it happened several times, I created another Linux VM with another distribution (Ubuntu), connected to same NAS, and I got the same message:
Feb 15 08:59:59 srv-pedrosa kernel: [234285.030860] [16883] 0 16883 18492 25 2 0 0 sshd
Feb 15 08:59:59 srv-pedrosa kernel: [234285.030868] [17120] 0 17120 18460 27 2 0 0 sshd
Feb 15 08:59:59 srv-pedrosa kernel: [234285.030876] [17355] 0 17355 18417 26 0 0 0 sshd
Feb 15 08:59:59 srv-pedrosa kernel: [234285.030884] [17595] 0 17595 18443 27 2 0 0 sshd
Feb 15 08:59:59 srv-pedrosa kernel: [234285.030892] [17834] 0 17834 18450 24 2 0 0 sshd
Each VM has a different kernel version: CentOS -> 2.6.32 | Ubuntu -> 3.2.0
Googling, I found it could due to vmmemctl. Has anyone had this before?
My ESXi hosts are at version 4.1 Update1, and this behaviour happens on all of them. Maybe it could due to be on an old ESXi version?
Thanks in advance,
Hey Damocue,
Would help if the vmware.log for the affected VM's as well as the /var/log/messages from the last hang were shared across.
Regards
a
vmware.log
Feb 14 20:22:06.800: vmx| GuestRpcSendTimedOut: message to toolbox timed out.
Feb 14 20:22:06.800: vmx| Vix: [17257331 guestCommands.c:2468]: Error VIX_E_TOOLS_NOT_RUNNING in VMAutomationTranslateGuestRpcError(): VMware Tools are not running in the guest
Feb 14 20:35:42.185: vcpu-3| VMMouse: CMD Disable
Feb 14 20:35:42.185: vcpu-3| VMMouse: Disabling VMMouse mode
Feb 14 20:36:02.658: mks| MKS switching absolute mouse off
Feb 14 21:21:08.665: mks| MKS: Base polling period is 10000us
Feb 14 21:21:08.974: mks| VNCENCODE 4 encoding mode change: (720x400x24depth,32bpp,2880bytes/line)
Feb 14 21:21:14.105: mks| SVGA: display status changed, using optimizations for remote consoles.
Feb 14 21:21:49.728: vmx| Vix: [17257331 vmxCommands.c:392]: VMAutomation_Reset
Feb 14 21:21:49.728: vmx| Vix: [17257331 vmxCommands.c:457]: VMAutomation_Reset. Trying hard reset
Feb 14 21:21:49.728: vmx|
Feb 14 21:21:49.728: vmx|
Feb 14 21:21:49.728: vmx| VMXRequestReset
Feb 14 21:21:49.728: vmx| Stopping VCPU threads...
Is the VMware HA - VM monitoring for the Virtual Machine enabled? If so, can that be turned off and checked?
http://kb.vmware.com/kb/1027734
The interesting bit is here
eb 18 10:33:39 srv-doohan kernel: [17924] 0 17924 37172 3003 0 0 0 perl
Feb 18 10:33:39 srv-doohan kernel: [17986] 0 17986 26523 346 3 0 0 bash
Feb 18 10:33:39 srv-doohan kernel: [18000] 0 18000 26523 153 0 0 0 bash
Feb 18 10:33:39 srv-doohan kernel: [18001] 0 18001 37238 3080 2 0 0 perl
Feb 18 10:33:39 srv-doohan kernel: Out of memory: Kill process 1480 (vmtoolsd) score 1 or sacrifice child
Feb 18 10:33:39 srv-doohan kernel: Killed process 1480, UID 0, (vmtoolsd) total-vm:50092kB, anon-rss:284kB, file-rss:828kB
Feb 18 10:34:08 srv-doohan snmpd[1959]: Connection from UDP: [192.168.100.77]:59867->[192.168.100.106]
Feb 18 10:34:08 srv-doohan snmpd[1959]: Connection from UDP: [192.168.100.77]:59867->[192.168.100.106]
But what is more interesting is here
Feb 18 09:43:30 srv-doohan kernel: [15724] 0 15724 37238 3081 3 0 0 perl
Feb 18 09:43:30 srv-doohan kernel: [15729] 0 15729 25275 194 1 0 0 df
Feb 18 09:43:30 srv-doohan kernel: Out of memory: Kill process 2532 (Xvnc) score 3 or sacrifice child
Feb 18 09:43:30 srv-doohan kernel: Killed process 2532, UID 500, (Xvnc) total-vm:106780kB, anon-rss:8104kB, file-rss:1948kB
Feb 18 09:43:31 srv-doohan gnome-keyring-daemon[3092]: dbus failure unregistering from session: Connection is closed
Feb 18 09:43:31 srv-doohan gnome-keyring-daemon[3092]: dbus failure unregistering from session: Connection is closed
Feb 18 09:44:09 srv-doohan snmpd[1959]: Connection from UDP: [192.168.100.77]:33162->[192.168.100.106]
Feb 18 09:44:09 srv-doohan snmpd[1959]: Connection from UDP: [192.168.100.77]:33162->[192.168.100.106]
and
Feb 18 09:43:29 srv-doohan kernel: beremote invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Feb 18 09:43:29 srv-doohan kernel: beremote cpuset=/ mems_allowed=0
Feb 18 09:43:29 srv-doohan kernel: Pid: 356, comm: beremote Not tainted 2.6.32-279.22.1.el6.x86_64 #1
Feb 18 09:43:29 srv-doohan kernel: Call Trace:
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff810c2c31>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81112f80>] ? dump_header+0x90/0x1b0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8120e2cc>] ? security_real_capable_noaudit+0x3c/0x70
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81113402>] ? oom_kill_process+0x82/0x2a0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81113341>] ? select_bad_process+0xe1/0x120
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81113840>] ? out_of_memory+0x220/0x3c0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8112355e>] ? __alloc_pages_nodemask+0x89e/0x940
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8115772a>] ? alloc_pages_current+0xaa/0x110
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81110407>] ? __page_cache_alloc+0x87/0x90
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81125f3b>] ? __do_page_cache_readahead+0xdb/0x210
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81126091>] ? ra_submit+0x21/0x30
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81126405>] ? ondemand_readahead+0x115/0x240
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81126623>] ? page_cache_sync_readahead+0x33/0x50
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81111d88>] ? generic_file_aio_read+0x558/0x700
Feb 18 09:43:29 srv-doohan kernel: [<ffffffffa02a209a>] ? nfs_file_read+0xca/0x130 [nfs]
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8117660a>] ? do_sync_read+0xfa/0x140
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8117b6e4>] ? cp_new_stat+0xe4/0x100
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81219e2b>] ? selinux_file_permission+0xfb/0x150
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8120cd06>] ? security_file_permission+0x16/0x20
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81176ef5>] ? vfs_read+0xb5/0x1a0
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff81177031>] ? sys_read+0x51/0x90
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff810d3c95>] ? __audit_syscall_exit+0x265/0x290
Feb 18 09:43:29 srv-doohan kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Feb 18 09:43:29 srv-doohan kernel: Mem-Info:
Feb 18 09:43:29 srv-doohan kernel: Node 0 DMA per-cpu:
Feb 18 09:43:29 srv-doohan kernel: CPU 0: hi: 0, btch: 1 usd: 0
Feb 18 09:43:29 srv-doohan kernel: CPU 1: hi: 0, btch: 1 usd: 0
Now there are various references to this and all point to the VM basically running out of memory
http://askubuntu.com/questions/161521/why-does-my-server-freeze-everyday-at-the-same-time
http://ubuntuforums.org/showthread.php?s=14cf93e5feeed092bdfd0fdeebff02c3&t=1421823&page=2
Question is, what applications are you running within the guests?
Are the hosts - and hence the VM's overcommitted for memory/CPU?
Can more Memory be assigned to these guests and check?
But more than that, what it looks like is that there is a memory leak within some application you are running. Would help if this were posted on the Ubuntu/Cent OS forums as well
Regards
a