Re: Using VMCI streams, Connect fails to return on...

paulbb · ‎11-30-2010

I have implemented a client/server application using VMCI to pass data from the server VM which has a hardware card using VM Direct Path and multiple clients. I pass 1 megabyte packets from the server to the client, and get pretty good performance (20 gb/s to 1 vm, 40 gb/s aggregate to several VMs). The 1 megabyte packets give me the best performance, and I might go to larger packets (2Mb or 4Mb to improve performance).

I am running ESXi 4.1.

My problem is that after several applications interact with the server (connect, transfer data, close) the client connect does not return with either success or an error, it just hangs. Sometimes after waiting several hours, it magically starts working again. Rebooting the server (ESXi 4.1) usually brings me back to a working state.

Rebooting the client VM does not fix the problem. Rebooting the server VM also does not fix the problem. Changing the BACKLOG value passed to listen seems to have some correlation to the problem: if I set BACKLOG low, say 2, after a couple of failed attempts from the client, the connect will return with an error (Connection reset by peer).

I configure VMCI to have 2 megabyte max size messages, with 1 megabyte default size messages. I use pthreads to run 4 or more threads, to open up a range of 4 or more ports. All my VMs run linux, either Centos 5.x or Debian. I have more issues with the Debian.

After my server detects a close from the remote end, I shutdown the socket, then wait on an accept call for the next connection request on that port. This usually works a couple of times, then I get no notification on the server, and the client hangs.

Are there any configuration guidelines for VMCI? Any guidelines for multi-threaded VMCI server?

Is this the right place to post this problem?

paulbb · ‎12-01-2010

I notice when I do a "dmesg", I see the following error message when the connect to VMCI doesn't work:

VMCIQueuePair: VMCIQueuePairAlloc_HyperCall result = -37.

VSock[12]: Could not attach to queue pair with -37

Any suggestions?

svaerke · ‎12-01-2010

Is the dmesg output you mention on the server side (the one using VM Direct Path)? If so, I suspect that there is an issue with the concurrent use of VM Direct Path and VMCI Sockets. Could you take a look at the vmware.log of the server VM and also the log on the ESXi server. Also, you may try to disable VM Direct Path to verify whether this is part of the problem.

paulbb · ‎12-01-2010

the dmesg output is on the server side VM, the VM using VM Direct Path.

I can look at the log on the ESXi server for the server VM, and there is no message generated.

Where is the vmware.log file on the linux VM? I tried looking for it, and can't find it.

Yes, I am using VM DirectPath, and I believe my problems started when I used the card with my application. However, that is the entire reason for using VMCI, is I have a 4 port network card, and I want to direct 1 port to each of 4 VMs.

Is there a known issue with VM DirectPath and VMCI sockets? How can I get this fixed? I don't believe that there is any other mechanism that gives me high performance / high throughput - say 40 gbit per second. And I don't know how to write a driver that runs in ESXi that could own the PCIe card and provide the virtual interfaces to the VMs. And that solution would be dependent on ESXi, which would make it very hard to deploy to an external customer base.

svaerke · ‎12-01-2010

You should be able to find the vmware.log file in the same directory as the .vmx file of your server VM.

There aren't any known issues with VM Direct Path and VMCI. Until we know a bit more about what the actual problem is, it is hard to suggest any workarounds.

paulbb · ‎12-01-2010

Ok, that was the vmware.log file that I looked at then, on the ESXi host file system for the VM that is running as the server. No, there are no messages generated in the log file.

What I see happening on the server VM, is that VMCI clients can connect, transfer data, and close the connection. Sometimes 1 or 2 client sessions can connect and close, and other times 5 or 6 (not necessarily all concurrent). But at some point, a connect fails to get through, and the -37 message is in the linux 'dmesg' output on the server. Once this happens, none of the VM clients can connect, sometimes for a short while, sometimes for a long while.

Sometimes when I restart my server app, using a different port, it works, and sometimes it does not. Rebooting the client does not work. Rebooting the server VM sometimes works. Rebooting the ESXi host always works.

So how can I debug this further? Where is error 37 documented? Is it to do with the size of the buffers I am using, should I have larger buffers? Is there any config that needs to be changed since I want larger buffers than then 256kbyte default buffers?

paulbb · ‎12-02-2010

I modified my server application to optionally work without the Directpath device. The directpath device is still connected to the VM, but if I don't utilize the device, I don't get the error as far as I could see when I tested.

When I went back to using the Directpath device, I didn't see any issues for a long time. Then I went for lunch, and when I returned from lunch, and ran my client VM, I got the error again:

VMCIQueuePair: VMCIQueuePairAlloc_HyperCall result = -37.

VSock12: Could not attach to queue pair with -37

The other change I made was I took my server VM from 4 cores down to 2 cores.

So without more prolonged testing, I don't know if I have narrowed anything down, but it does look like if I don't utilize the DMA transfer of my directpath device, I don't see the problem. Does this give anyone a hint?

svaerke · ‎12-02-2010

Thanks for investigating this further. It does sound like the VM Direct Path is what is triggering the VMCI queue pair allocation error. You mention that you changed the number of cores as well from 4 to 2 - and following that it took longer for the problem to occur. Could you try to change the number of cores back to see whether the frequency of the problem is affected by the number of cores allocated to the server VM? Also, instead of doing a reboot of the ESX box when this problem occur, could you try to do a power off and power on of the server VM instead (this will clean up more VM state than if you do a reboot from inside the VM).

To give a bit of background: The VMCI stream sockets use VMCI queue pairs as their data transfer mechanism. The VMCI driver in the guest will allocate a range of pages and allow the VMCI device to DMA into those pages. However, these pages must not be in use by any other device (and normally they should wont be) - the VMCI device verifies this during the queue pair allocation (the call to VMCIQueuePairAlloc_HyperCall), and if it thinks that another device is using them, it will return the -37 error (essentially there is some reference counting of the pages of the VM that is examined by the VMCI device to determine this).

When using VM Direct Path, the VMs initial configuration is that all memory of the VM is accessible to the directpath device, so we take care to remove the pages used by the VMCI device from the group of pages accessible to the directpath device, and add them back when the VMCI device no longer uses them. This will happen for the VM as long as a directpath device is assigned to the VM, regardless of whether the device is in active use.

So from what you are describing, it may be that actually using the directpath device marks some pages as in use by the directpath device and that these pages are not dealt with properly when using them for the VMCI device. Does your server application have any special handling of the directpath device, e.g., some form of zero-copy data transfer or similar? Also, is the driver for the directpath device loaded even when you don't utilize it?

I also have a question about the hanging clients. When you reboot a client VM and try to connect again, I assume that the -37 error occurs again if the client is unable to connect?

paulbb · ‎12-03-2010

Funny, I left my server VM powered up overnight. When I left, it was having the error 37. When I came in this morning, I fired up a client, and the server responded without the error happening.

My directpath card is a high performance NIC. I can configure the number of software ports, up to 32, and the number and size of buffers that are DMA'ed to the user. I am using 4 ports, each with 16 buffers of 1 MB in size. The card supports 1/2/4 MB buffers, and handles 16 buffers per port in hardware. I choose when to load the driver, and the buffers are not allocated until I open the respective ports. I can specify whether to reserve xxx mb of ram for the buffers which the driver will manage, or for the driver to just alloc memory as needed.

When my client shuts down, the server code closes the software port, which stops the hardware from DMAing into memory. So overnight, the card was not DMAing at all. Does the reference count age over time?

I will continue with some testing today.

paulbb · ‎12-03-2010

Some additional observations.

When the client hangs on the connect call, if I reboot the client, it does not connect and returns the error "Invalid Argument". When I try again, then the connect hangs, and the server reports the 37 error in dmesg.

If I shutdown the client, and power it back on, the first time I get connect failing with "Invalid Argument". The second try I get "Connection Reset by Peer". Third time the connect hangs and the server reports the 37 error message in dmesg.

Yes, my NIC is doing zero copy, bypassing the kernel and delivering the buffer to the user application.

paulbb · ‎12-03-2010

I do notice in the shutdown messages on the server that there is often an error when the VMCI interface is shutdown. Don't know if that means anything. (if the messages didn't scroll by so quickly, I would repeat it for you).

svaerke · ‎12-06-2010

The reference counts do not age over time as such. What may happen is that a virtual device of the VM is caching a reference to a VM page and only when the page is evicted from the cache is the reference count decreased. Another possibility is that kernel memory allocated by a device driver in the server VM is freed while the device is still using it. If such a page is then reused by the VMCI driver, the VMCI device will detect that the page is still in use based on the reference count and return the error 37. Once the other device is finished with the page, the VMCI device should be able to use it.

However, in both of these cases, I would expect a reboot of the server VM to resolve the problem.

You may try to make the high performance nic driver reserve all the memory it needs instead of allocating it when needed (you mention that as one of two possible ways to use the driver). If there is an issue with how your high performance nic manages the dynamic memory allocation, then having a static pool of memory might confine the problem somewhat.

If it is a problem with one of the virtual devices caching a reference to a VM page or similar, the best way forward is either that we try to reproduce your issue locally at our end so that we can inspect what is going on on the hypervisor side, or that we provide you with a modified version of ESXi 4.1 that contains additional logging information or possibly generating a crash dump of the server VM when the error occurs.

svaerke · ‎12-06-2010

You might be able to see the error message by manually unloading the VMCI socket and VMCI drivers by doing: rmmod vsock; rmmod vmci

paulbb · ‎12-08-2010

Funny thing. Monday I reduced memory in the server vm from 3000 MB to 1500MB, as it was a bit excessive, but I do have 24000 MB on the server. I also needed more clients VMs, and since I can't figure out how to clone or copy a VM using VSphere or through telnet to the ESXi host, I created another 7 Centos VMs, each single core. In using these as clients all day Tuesday I didn't experience any errors. Will see if this continues. I used to get the errors most frequently using a Debian client, maybe I will go back to that.

I am also going to try reserving the DMA memory for the VM Directpath I/O card. The other activity I do is I run NetBeans IDE (and hence Java) along with the C/C++ compiler to rebuilt my software. I haven't done much development while creating all those new VMs, and perhaps there is a relationship there?

paulbb · ‎12-09-2010

I wasn't running with my Directpath device's buffers preallocated. I got the error 37. I did the rmmod vsock; rmmod vmci. Here is what was in dmesg:

VSock[3838]: Could not destroy VMCI datagram handle.

Removing vmci device

Resetting vmci device

Unregistered vmci device.

ACPI: PCI interrupt for device 0000:00:07.7 disabled

For some reason this time I was able to run several days without getting a 37 error. I will now try with preallocating the DMA buffers for the PCIe device.

Does this give you any useful information?

Is there an issue with VMCI sockets that don't close down nicely? I have been debugging my server app, so have died leaving sockets open to clients. And it seemed the past couple of days where I wasn't making any code changes, I didn't run across the 37 error. But today, once I started debugging some new code, to allow multiple concurrent connections on the same port, and having a number of seg faults, etc., I ended up with the 37 error.

svaerke · ‎12-09-2010

We are not aware of any issues with VMCI not cleaning up connections when a process dies. Also, if the VMCI stream sockets weren't cleaned up properly I would expect a different error message (basically that would cause a situation similar to the one that leads to error 37, but we would detect that in an earlier step). Another possibility is that the zero-copy driver you are using isn't closing down nicely?

Are you using the version of VMware tools that came with ESXi 4.1? I'll look into the issue you see with the clients hanging. You should simply get a connect error in that case, and that is what happens when I try to reproduce it.

The error message from VSock is not related to your problem. It is trying to clean up a handle that has already been cleaned up.

paulbb · ‎12-09-2010

Yes, I am using the VMWare tools that came with ESXi 4.1. I am running 64-bit Centos 5.5.

paulbb · ‎01-06-2011

Well, today things decided to not behave again, for no apparent reason. ESXi 4.1 server powered up today after a holiday rest. I have been running fine for a couple of weeks.

If I do a dmesg on my server, I see "VMCIQueuePairAlloc_HyperCall result = -37." And "VSock[11]: Could not attach to queue pair with -37".

So if I kill the server, I would expect client requests to be rejected. Instead the clients hang on a socket call, and dmesg on the server returns the error:

Destination handle 0x31072bee:0x1 doesn't exists.

Datagram with resource 1 failed with err fffffff8.

I am running with the buffers for the DMA hardware preallocated, so there should be no conflicts from memory.

Any suggestions on how to proceed?

svaerke · ‎03-14-2011

I'm terribly sorry about the late reply - your update had fallen through the cracks.

The error you are seeing is with respect to the client hanging is an unwanted side effect of the way the VMCI Socket module registers the VMCI protocol. We are working on fixing these client hang issues in upcoming releases/updates. Unfortunately, fixing them will not fix the "VMCIQueuePairAlloc_HyperCall result = -37" error.

Are you still seeing the hypercall related error (you mentioned previously that it was mainly when you where working on your multiplexing application that it showed up)? If so, the best way forward may be to have you install an instrumented build of ESXi 4.1, that can give us more information about what happens when the queue pair allocation hypercall fails.

paulbb · ‎03-14-2011

I am now seeing the error again. I was originally on a 2.27 GHz Nehalem and I started doing some performance testing on a 2.4 GHz Westmere, and I started to see the error appear, even though I wasn’t doing any development. What was also new was that I started to see the error when I had several VMCI connections open and then another new connection would cause the -37 error.

How would I go about getting an instrument ESXi 4.1 and installing it?

Is there anyplace I can find out more about performance and VMCI? I am seeing that I can have an aggregate of 20-30 gb/s throughput from one server to multiple clients, and as the throughput approaches 20 gb/s, the amount of CPU required increases non-linearly. Is the kernel doing a copy of the data, or just updating page tables between the systems? I have tried with 1024KB, 2048KB, and 4096KB buffer sizes (the sizes supported by the adapter card which DMA’s into the buffers, and then I pass those same buffers to the VMCI send routine – maybe this is setting off your alarm bells as to why the -37 is happening … but I own the buffer at the time I do the send call, and when the send returns, I return the buffer to the card buffer pool for future DMA activity), and I can’t say that I get better performance with larger buffers. Nor does adding additional cores to my server seem to provide much additional throughput.

All

Using VMCI streams, Connect fails to return on client