VMware Cloud Community
DusanPohl
Contributor
Contributor

VM's randomly unresponsive, ESX3.0.2, Proliant DL580 G5

Hello all,

We are running 6 ESX's Proliant DL580 G5 in two clusters. Each cluster has three ESX's 3.0.2, 63195, HA activated and DRS activated - fully automatic.

Problem is that randomly VM's (all W2k3 standard) become unresponsive. When last VM became unresponsive in /var/log/vmware/hostd.log I found these records.

Ticket issued for mks connections to user: vpxuser

Current value 169812 exceeds soft limit 122880.

Ticket issued for mks connections to user: vpxuser

Current value 169812 exceeds soft limit 122880.

Propagating stats from interval 20 to 300

Has anyone any idea what may causing these systems to become unresponsive? Or if problem is more complex can you help me to figure out why last VM's became unresponsive?

Thanks for help.

Tags (1)
0 Kudos
8 Replies
weinstein5
Immortal
Immortal

What is the configuration of your ESX Server - Memory, processors, storage etc? How many VMs are you running? What is the configuration of the VMS - # virtual processors, memory?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
Troy_Clavell
Immortal
Immortal

you may think about increasing your Service Console Memory, to see if this helps with your issues.

http://kb.vmware.com/kb/1003501

0 Kudos
Rob_Bohmann1
Expert
Expert

As Troy mentioned this message is concerning the service console memory.

0 Kudos
DusanPohl
Contributor
Contributor

OK here it is:

All ESX servers are identical and configured as follows: HP ProLiant DL580 G5

1. Memory: Total 36861,3MB/System 2799,3MB/Virtual Machines/33790MB/Service Console 272MB

2.CPU: Model Intel Xeon / Processor Speed 2.1GHz / Sockets 4 / Cores per socket 4 / Logical 16 / Hyperthreading Disableb

3. Storage adapter: QLA2432

4. Storage: Fiber Channel SAN storage

Each ESX hosts 3 VM's - W2K3 Standard - configured - 2vCPU/4096MB/Overhead 144.32MB/ VMware tool/ each VM separate LUN 752GB

0 Kudos
DusanPohl
Contributor
Contributor

thanks boys, will have a look at it and post here whether it helped.

0 Kudos
DusanPohl
Contributor
Contributor

I increased console memory as suggested and additionaly only for troubleshooting purposes deactivated DRS. Messages are gone now but this morning

another VM became unresponsive.

Thus I checked logs and here is what I found in /var/log/vmkernel for that machine + plus now I'm going to check other logs as well.

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu10:1073)World: vm 1073: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu11:1074)World: vm 1074: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu8:1088)World: vm 1088: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu8:1087)World: vm 1087: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.427 cpu13:1086)World: vm 1086: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.453 cpu5:1091)World: vm 1091: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.453 cpu6:1090)World: vm 1090: 3864: Killing self with status=0x0:Success

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.335 cpu5:1071)World: vm 1114: 690: Starting world vmm0:SWP-VMMU004 with flags 8

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4836: adding 'vmm0:SWP-VMMU004': group 'host/user': cpu: shares=-3 min=0 max=-1

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4849: renamed group 14 to vm.1071

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4863: moved group 14 to be under group 4

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.342 cpu5:1071)Swap: vm 1114: 1426: extending swap to 4194304 KB

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.350 cpu5:1071)World: vm 1115: 690: Starting world vmm1:SWP-VMMU004 with flags 8

What else should I do?

0 Kudos
DusanPohl
Contributor
Contributor

I increased console memory as suggested and additionaly only for troubleshooting purposes deactivated DRS. Messages are gone now but this morning

another VM became unresponsive.

Thus I checked logs and here is what I found in /var/log/vmkernel for that machine + plus now I'm going to check other logs as well.

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu10:1073)World: vm 1073: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu11:1074)World: vm 1074: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu8:1088)World: vm 1088: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.409 cpu8:1087)World: vm 1087: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.427 cpu13:1086)World: vm 1086: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.453 cpu5:1091)World: vm 1091: 3864: Killing self with status=0x0:Success

Jan 26 11:35:32 swp-esx0005 vmkernel: 4:20:00:24.453 cpu6:1090)World: vm 1090: 3864: Killing self with status=0x0:Success

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.335 cpu5:1071)World: vm 1114: 690: Starting world vmm0:SWP-VMMU004 with flags 8

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4836: adding 'vmm0:SWP-VMMU004': group 'host/user': cpu: shares=-3 min=0 max=-1

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4849: renamed group 14 to vm.1071

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.336 cpu5:1071)Sched: vm 1114: 4863: moved group 14 to be under group 4

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.342 cpu5:1071)Swap: vm 1114: 1426: extending swap to 4194304 KB

Jan 26 11:35:36 swp-esx0005 vmkernel: 4:20:00:28.350 cpu5:1071)World: vm 1115: 690: Starting world vmm1:SWP-VMMU004 with flags 8

What else should I do?

0 Kudos
DusanPohl
Contributor
Contributor

plus please, have a look at all these messages from hostd

Task Created : haTask-480-vim.VirtualMachine.reset-456

Reset request recieved

Registered Foundry callback on 2

Adding task: haTask-480-vim.VirtualMachine.reset-456

VM State transition requested to VM_STATE_RESETTING

Event generated

Event 6 : SWP-VMMU004 on swp-esx0005.deutschepost.dpwn.com in ha-datacenter is reset

State Transition (VM_STATE_ON -> VM_STATE_RESETTING)

VM State transition post act for VM_STATE_RESETTING

Resetting power state handler

CheckLicenses: Checking licenses based on VM features used.

GetHardwareCapability: Checking VM Feature Utilization.

GetHardwareCapability: VM Capability used : san

GetHardwareCapability: VM Capability used : vsmp

Reset request queued

Tracking progress for method : vim.VirtualMachine.reset

Retrieved current power state from foundry 1

Updated VM state machine with new power state: 1

DISKLIB-VMFS : "/vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004-flat.vmdk" : open successful (17) size = 17179869184, hd = -1. Type 3

DISKLIB-VMFS : "/vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004-flat.vmdk" : closed.

Updating config data

Time to gather config: 20 (msecs)

Posting vmevent to '/vm/reconfig/':

timestamp = 1232966136898465 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure

Tools are not operations ready

New MKS connection detected.

Updating config data

Time to gather config: 17 (msecs)

Posting vmevent to '/vm/reconfig/':

timestamp = 1232966136943148 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure

Retrieved current power state from foundry 1

Updating config data

Disconnect check in progress: /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx

Time to gather config: 17 (msecs)

Posting vmevent to '/vm/reconfig/':

timestamp = 1232966136993930 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure

Retrieved current power state from foundry 1

Updated VM state machine with new power state: 1

Completed operation

Resetting power state handler

Retrieved current power state from foundry 1

Refreshing powerstate: 1, 0

VM State transition requested to VM_STATE_ON

Event generated

Event 7 : SWP-VMMU004 on swp-esx0005.deutschepost.dpwn.com in ha-datacenter is powered on

State Transition (VM_STATE_RESETTING -> VM_STATE_ON)

VM State transition post act for VM_STATE_ON

Updating current power state: 1

Posting vmevent to '/vm/runtime/powerop/':

timestamp = 1232966137045560 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx oldPowerState = 1 newPowerState = 1

UpdateOverhead: is64Bit = false

Task Completed : haTask-480-vim.VirtualMachine.reset-456

Removing task: haTask-480-vim.VirtualMachine.reset-456

Received state change for VM '480'

Retrieved current power state from foundry 1

Adding vm 480 to poweredOnVms list

Retrieved current power state from foundry 1

Updating config data

Time to gather config: 18 (msecs)

Posting vmevent to '/vm/reconfig/':

timestamp = 1232966137115489 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure

Updating config data

Time to gather config: 26 (msecs)

Posting vmevent to '/vm/reconfig/':

timestamp = 1232966137164131 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure

Updating config data

Time to gather config: 19 (msecs)

Posting vmevent to '/vm/reconfig/':

timestamp = 1232966137205985 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure

Updating config data

Retrieved current power state from foundry 1

Time to gather config: 17 (msecs)

Posting vmevent to '/vm/reconfig/':

timestamp = 1232966137246111 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure

Retrieved current power state from foundry 1

Retrieved current power state from foundry 1

Retrieved current power state from foundry 1

Running status of tools changed to: notRunning

Task Created : haTask-ha-root-pool-vim.ResourcePool.updateConfig-458

Task Completed : haTask-ha-root-pool-vim.ResourcePool.updateConfig-458

Updating config data

Time to gather config: 18 (msecs)

Posting vmevent to '/vm/reconfig/':

timestamp = 1232966137344113 VM ID = 480 VM cfg Path = /vmfs/volumes/47d0e4b4-b44691c8-d099-001cc4934778/SWP-VMMU004/SWP-VMMU004.vmx: reconfigure

Sending notification failed to receiver post. Status: 1, Command output: snmptrap: Unknown host (Resource temporarily unavailable)

0 Kudos