Hello all,
We are running 6 ESX's Proliant DL580 G5 in two clusters. Each cluster has three ESX's 3.0.2, 63195, HA activated and DRS activated - fully automatic.
Problem is that randomly VM's (all W2k3 standard) become unresponsive. When last VM became unresponsive in /var/log/vmware/hostd.log I found these records.
Ticket issued for mks connections to user: vpxuser
Current value 169812 exceeds soft limit 122880.
Ticket issued for mks connections to user: vpxuser
Current value 169812 exceeds soft limit 122880.
Propagating stats from interval 20 to 300
Has anyone any idea what may causing these systems to become unresponsive? Or if problem is more complex can you help me to figure out why last VM's became unresponsive?
Thanks for help.
What is the configuration of your ESX Server - Memory, processors, storage etc? How many VMs are you running? What is the configuration of the VMS - # virtual processors, memory?
If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
you may think about increasing your Service Console Memory, to see if this helps with your issues.
OK here it is:
All ESX servers are identical and configured as follows: HP ProLiant DL580 G5
1. Memory: Total 36861,3MB/System 2799,3MB/Virtual Machines/33790MB/Service Console 272MB
2.CPU: Model Intel Xeon / Processor Speed 2.1GHz / Sockets 4 / Cores per socket 4 / Logical 16 / Hyperthreading Disableb
3. Storage adapter: QLA2432
4. Storage: Fiber Channel SAN storage
Each ESX hosts 3 VM's - W2K3 Standard - configured - 2vCPU/4096MB/Overhead 144.32MB/ VMware tool/ each VM separate LUN 752GB
thanks boys, will have a look at it and post here whether it helped.
I increased console memory as suggested and additionaly only for troubleshooting purposes deactivated DRS. Messages are gone now but this morning
another VM became unresponsive.
Thus I checked logs and here is what I found in /var/log/vmkernel for that machine + plus now I'm going to check other logs as well.
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu10:1073)World: vm 1073: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu11:1074)World: vm 1074: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu8:1088)World: vm 1088: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu8:1087)World: vm 1087: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.427 cpu13:1086)World: vm 1086: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.453 cpu5:1091)World: vm 1091: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.453 cpu6:1090)World: vm 1090: 3864: Killing self with status=0x0:Success
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.335 cpu5:1071)World: vm 1114: 690: Starting world vmm0:SWP-VMMU004 with flags 8
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4836: adding 'vmm0:SWP-VMMU004': group 'host/user': cpu: shares=-3 min=0 max=-1
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4849: renamed group 14 to vm.1071
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4863: moved group 14 to be under group 4
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.342 cpu5:1071)Swap: vm 1114: 1426: extending swap to 4194304 KB
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.350 cpu5:1071)World: vm 1115: 690: Starting world vmm1:SWP-VMMU004 with flags 8
What else should I do?
I increased console memory as suggested and additionaly only for troubleshooting purposes deactivated DRS. Messages are gone now but this morning
another VM became unresponsive.
Thus I checked logs and here is what I found in /var/log/vmkernel for that machine + plus now I'm going to check other logs as well.
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu10:1073)World: vm 1073: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu11:1074)World: vm 1074: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu8:1088)World: vm 1088: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu8:1087)World: vm 1087: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.427 cpu13:1086)World: vm 1086: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.453 cpu5:1091)World: vm 1091: 3864: Killing self with status=0x0:Success
Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.453 cpu6:1090)World: vm 1090: 3864: Killing self with status=0x0:Success
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.335 cpu5:1071)World: vm 1114: 690: Starting world vmm0:SWP-VMMU004 with flags 8
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4836: adding 'vmm0:SWP-VMMU004': group 'host/user': cpu: shares=-3 min=0 max=-1
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4849: renamed group 14 to vm.1071
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4863: moved group 14 to be under group 4
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.342 cpu5:1071)Swap: vm 1114: 1426: extending swap to 4194304 KB
Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.350 cpu5:1071)World: vm 1115: 690: Starting world vmm1:SWP-VMMU004 with flags 8
What else should I do?
plus please, have a look at all these messages from hostd
Task Created : haTask-480-vim.VirtualMachine.reset-456
Registered Foundry callback on 2
Adding task: haTask-480-vim.VirtualMachine.reset-456
VM State transition requested to VM_STATE_RESETTING
Event 6 : SWP-VMMU004 on swp-esx0005.deutschepost.dpwn.com in ha-datacenter is reset
State Transition (VM_STATE_ON -> VM_STATE_RESETTING)
VM State transition post act for VM_STATE_RESETTING
CheckLicenses: Checking licenses based on VM features used.
GetHardwareCapability: Checking VM Feature Utilization.
GetHardwareCapability: VM Capability used : san
GetHardwareCapability: VM Capability used : vsmp
Tracking progress for method : vim.VirtualMachine.reset
Retrieved current power state from foundry 1
Updated VM state machine with new power state: 1
DISKLIB-VMFS : "/vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004-flat.vmdk" : open successful (17) size = 17179869184, hd = -1. Type 3
DISKLIB-VMFS : "/vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004-flat.vmdk" : closed.
Time to gather config: 20 (msecs)
Posting vmevent to '/vm/reconfig/':
timestamp = 1232966136898465 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure
Tools are not operations ready
Time to gather config: 17 (msecs)
Posting vmevent to '/vm/reconfig/':
timestamp = 1232966136943148 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure
Retrieved current power state from foundry 1
Disconnect check in progress: /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx
Time to gather config: 17 (msecs)
Posting vmevent to '/vm/reconfig/':
timestamp = 1232966136993930 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure
Retrieved current power state from foundry 1
Updated VM state machine with new power state: 1
Retrieved current power state from foundry 1
VM State transition requested to VM_STATE_ON
Event 7 : SWP-VMMU004 on swp-esx0005.deutschepost.dpwn.com in ha-datacenter is powered on
State Transition (VM_STATE_RESETTING -> VM_STATE_ON)
VM State transition post act for VM_STATE_ON
Updating current power state: 1
Posting vmevent to '/vm/runtime/powerop/':
timestamp = 1232966137045560 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx oldPowerState = 1 newPowerState = 1
UpdateOverhead: is64Bit = false
Task Completed : haTask-480-vim.VirtualMachine.reset-456
Removing task: haTask-480-vim.VirtualMachine.reset-456
Received state change for VM '480'
Retrieved current power state from foundry 1
Adding vm 480 to poweredOnVms list
Retrieved current power state from foundry 1
Time to gather config: 18 (msecs)
Posting vmevent to '/vm/reconfig/':
timestamp = 1232966137115489 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure
Time to gather config: 26 (msecs)
Posting vmevent to '/vm/reconfig/':
timestamp = 1232966137164131 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure
Time to gather config: 19 (msecs)
Posting vmevent to '/vm/reconfig/':
timestamp = 1232966137205985 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure
Retrieved current power state from foundry 1
Time to gather config: 17 (msecs)
Posting vmevent to '/vm/reconfig/':
timestamp = 1232966137246111 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure
Retrieved current power state from foundry 1
Retrieved current power state from foundry 1
Retrieved current power state from foundry 1
Running status of tools changed to: notRunning
Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-458
Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-458
Time to gather config: 18 (msecs)
Posting vmevent to '/vm/reconfig/':
timestamp = 1232966137344113 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure
Sending notification failed to receiver post. Status: 1, Command output: snmptrap: Unknown host (Resource temporarily unavailable)