Re: unable to connect to the mks: error connecting...

andy204 · ‎10-13-2009

Hello,

i know this problem seems to be common, but either there are no solutions available, or solutions are not applicable.

i will start with the hardware and software we are running esxi4 (latest patchlevel). The Problem is occuring on more than one blade,

but i will describe the scenario with one blade. (other blades do have about the same vm setup. hardware is the same)

we do not have any special resource / share configuration. all share,... settings are as on a fresh esxi installation.

Hardware:

- BLC460 G1, 2 x xeon E5430, 16 gb ram 2 local sas disc, 4 nics (2 for network, 2 for nas)

- NetApp Filer

The Datastores are mounted through NFS

VMs on the Blade:

- linux (4 gb ram, 2 cpus)

- linux (2 gb ram, 2cpus)

- Windows Server 2008R2 (4 gb ram, 2 cpus)

- all vms do have 4mb video ram (maybe this the problem?)

all vms do have 1 or more e1000 interfaces configured. some vms are migrated from physical machines. the win 2008 r2 server is a fresh installed 2 weeks ago.

we have no blade where we exceed our 8 core / 16 gb ram limit. it doesnt matter if windows 2003, 2008, 2008 R2,.. is running. all linux machines have no gui started. all running in run level 3. windows servers do have vmware tools installed.

now.. from time to time (sometimes after 10 days, sometimes after 2 months) we cant connect to the windows machines anymore. neither over console, nor over rdp. when trying with the vcenter console we get the message "unable to connect to the mks: error connecting to /bin/vmx process". if this happens, all windows machines on the blade are affected. no pingable anymore, they are just dead. we cant shutdown the vms (stops at 95%). all we can do is reboot the whole esxi server.

sometimes this problem occurs:

any ideas?

andy204 · ‎10-13-2009

just now we had the same problem on another blade. there is only 1 vm running on this blade.

i attach the hostd and the message file for this esxi instance.

andy204 · ‎10-13-2009

additional information:

when our problem occurs, we cannot:

- run the vsphere updater (hangs at Status: scanning....) when checking for updates

- export system logs anymore (we stopped waiting after 15 minutes)

-andy

andy204 · ‎10-13-2009

- we get lots of: cpu1: 19413555)FSS: 1073 1 4 e9 0 0 0 0 0 . . . . . not a directory

- reboot not possible. we have to cold reset the blade.

on Alt-F1 i just see some services that shutted down. last is ntpd. nothing more happens.

stony007_de · ‎10-13-2009

Did you check the DNS Config of the vCenter and the Guest System?

andy204 · ‎10-13-2009

dns is setup the right way on all blades.

additional to that we use ip addresses to connect to the nfs datastore. (just incase, there is a dns problem).

andy204 · ‎10-14-2009

some strange anomaly: it seems like, that when the first mks connection fails, all other blades suffer from the same issue within in the next days. last 3 days we had consequent crashes with all windows vms. all had the same error (topic of this post).

this is just a feeling, but happened the 2nd time already. if one crashes, then we are sure other vms will follow in the next hours/days.

-andy

andy204 · ‎10-16-2009

any ideas?

cbehmer · ‎10-20-2009

We are having this same issue with ESXI 4, with 3 2003 Window Servers. 1 Failed, tried to fix it gave up, began bringing another back to life to takes its place it started having same issues, only one left. Scared to look at it. Have not found anything in any of the forums pointing to an answer. CB

andy204 · ‎10-29-2009

i know it is not the fine way, but.. BUMP...

any ideas from anyone?

laborumexpatria · ‎11-24-2009

I am experiencing a similar problem in my HP Blade environment. In my case I have one virtual VC3 server running inside ESXi 4, on a Proliant BL460c (ROM Version I15 11/02/2008). Error description:

1. Veeam monitor report - "Connection problems" "not responding".

2. Could not logon to the virtual VC3 server using MSTSC or VC client (VC error - "Unable to connect to the MKS: Error connecting to /bin/vmx process...")

3. Attempt to suspend the virtual VC3 server on the ESXi 4 host using the VC client results in, a) suspent process hangs, b) Veeam monitor reports "Connection problems" "not responding" to the ESXi host machine.

4. Restarting managment network and management agents from the ESXi managment consol does not help - "Test Management Network" function is unable to ping the VC3 IP address.

5. Restarting the ESXi 4 server via the management console results in ESXi server hang.

6. Last resort was to do a system reset on the Proliant BL460c through the ILO console, which luckily worked but I would hate to think of the consequenses if there were more that 1 virtual server running on the host.

Footnote: I also have an ESXi 3.5 running on a DELL PowerEdge 1950 where Veeam monitor reports "Connection problems" "not responding", but in this case I am still able to logon to the virtual servers hosted on this machine.

Is this mostly a HP issue, or a general problem?

ADVANCEDNEWTECH · ‎12-14-2009

Hi guys, don't know whether this will help you all or not, but I am using EXSi 4.0 with vSphere -

I had the same issue with /bin/vmx and also couldn't shut down the VM, I right clicked on the VM and installed VM Tools, and the error is no longer appearing and I can start up/shut down/view in console etc.

Might be worth a try.

ADVANCEDNEWTECH · ‎12-14-2009

Hi guys,

Don't know if this will help; but I am currently running an ESXi 4 server with VMVisor and using a vSphere for client access.

Had the same issue with the bin/vmx - couldn't shut down and see blank screen in console (this is during set up phase).

I managed to install VMTools (Right click VM).

After the installation I can now reboot/shut down/start up/access console and no longer get the error message.

Might be worth a try...

andy204 · ‎12-14-2009

hello,

we do have vmware tools installed.

crcaldwell · ‎01-12-2010

Would I be correct to assume that this issue is still unresolved? I am experiencing the exact same scenario. The only difference I have is hardware and vmware version.

Hardware: Two HP ML350 G5's connected to HP shared storage. One server has 19GB RAM (vmserver1) and the other has 21GB RAM (vmserver2). (That is the only difference between the two servers).

VMware: Both servers running ESX 3.5

Both servers host three Windows Server 2003 VM's.

We have never had any issues with vmserver2. Vmserver1 however is the one that is experiencing the error. I'm relatively new to vmware so my knowledge is limited. Is there any logging that I can turn on (or access if it's enabled by default) that might help determine what the source of this issue is? Has anyone had any luck at all with correcting this issue? Thanks.

mrbios · ‎02-12-2010

I am having a very similar problem with some vm's. Here's my story:

I was running windows 2003r2, 2008 etc with no problems for over a year on:

HP ML370G5 running ESXi 3.5.0

and accessing with vSphere client

The vm's are stored on brand new Dell Equalogic storage units and we are using iSCSI? and ???

Not all vm's have the problem, also outlook has an issue where it randomly asks you for you password to reconnect even after you enter the correct pw and your password.

Then about 2 months ago we migrated to ESX 4

The converter tool was used twice. Once to convert physical servers to vm's to run under esx 3.5 and a second time to update the vm's to

run under ESX 4.

I noticed that two vm's which had HP / compaq server specific utilities installed have the issue at the console with the exact error being:

unable to connect to the mks: error connecting to /bin/vmx process

Also, at least one machines looses it network connection randomly. I can't ping it (my desktop can however ping other servers on the same physical blade. The ESX3.5 is still up and running but the vm's are off line or powered off etc. We did have the clock drift issue but we think that is solved.

Also, things can really slow down on the vsphere client console: powering off a windows 2003 server just as it was starting to boot up took several minutes! It should have been instant. Then I saw the connection bin vmx error.

Network Card is a prime suspect. It might be that the old drivers for the "physical" nic need to be uninstalled or removed from windows? I noticed " HP Network Configuration Utility " (description: Allows you to Team together Compaq NICs for the purposes of Network fault tolerance and Load Balancing" in the properties of Local Area Connection 3 Properties. It is not checked but could it be causing a problem? I will post updates when this is solved.

Lets solve this! And thank to everyone that posted so far.

Paul

San Diego, CA USA

andy204 · ‎02-15-2010

it doesnt matter if machines are migrated or are new installed.

can you please check the video settings on your "crashing" vms and post the video memory size here?

thanks

-andy

mrbios · ‎02-20-2010

My VM's never crash, there is just a mysterious hesitation. All the computers that I have trouble pinging are on the same dell blade. All the vm's are set the same - default video: 8M of memory, 800x600, 24 bit color.

ballebros · ‎04-27-2010

HiAndy

i'm having a very similar issue

also have a blade3000 (shorty)

did you manage to get a solution for this?

i have log call to vmware support

reboot using the ilo and the vm guy webex to the system, export the log and this is the solution he ask me to do

"The VMX file is corrupt. So power off the VM,recreate the virtual machine and point it to existing virtual disk. Please revert for any clarifications"

did u manage to find a solution? maybe we could help each other

i'm also getting our IT provider (fujitsu) to assist on this issue

much appricated

ballebros

andy204 · ‎04-30-2010

i have not found a solution yet. last updates seem to make it better, but it didnt help. i dont think that the vmx file is damaged. it would mean 60% of my vms are corrupt. even newly created vms crash. so.. no i have no clue where the problem can me

-andy