I have a problem with my VC2.5 and ESX3.5.
50% of all actions results in "An error occurred while communicating with the remote host". The 2nd try mostly works, sometimes i have to do another try.
I start an action, the host disconnects and a few seconds later the host is back and i can give it another try. The error occures on every host i have.
Can anyone help me, please?
For reference: http://communities.vmware.com/thread/149438
2 people in the above thread appear to have the same error.
One fixed it by installing the ESX certs on the VC server, the other still has the problem.
I am one of the people in the other thread seeing this issue
This issue is still happening - We have resinstalled an esx server and - i have regenerated / re'added certs to VCS and restarted everything - the ssl error and stream ended error are still occuring - it seems the certs are not being picked up for some reason
if possible - what are the specs of your servers and switches etc? are you using blades? im trying to narrow down why this happens on only a small portion of installations
i have an SR open with vmware aswell
welcome to the club
I am using HP blades (ProLiant BL460c G1) and Brocade SAN switches. But I don't think this is a hardware problem. Everything was running fine before, and I haven't changed anything at the hardware.
Make that a big me 5 or 6 or whatever. same symptoms .........
Dell blades, Cisco, EMC Broadcom NIC'w
I have two environments. My new data center running 3.5/2.5 has the problem. My old data center running 3.0.x/2.0.x doesnt have the problem. Am currently rebuilding the environment in the old data center on 3.5/2.5 havent seen the problem there yet but i suspect i will. Im betting a patch broke this.
This is the exact same issue that I am having. It started happening last Tuesday. I am running ESX 3i 3.5.0 B94430 and VC 2.5. B84767. In VC I have an HA cluster with 2 esx hosts in it. I have been working on the issue pretty much non stop and haven't found a solution yet. One thing that I did pickup on was, I used a protocol analyzer (WireShark) and created a filter to monitor traffic between my esx hosts and vc. I notice that when esx hosts are grayed out and showing "disconnected" or "not responding" that there is a large number SSLv3 checksum errors that are happening on the VC server when an esx host is communicating with it. It looks like the problem is a tcp windowing issue. I don't know how to fix it but just thought I would shed some light on it. It could even be nothing but seems kind of weird to me. I have ensured that all NICS are supported and have the latest firmware and drivers. To me the issue is with VC and HA. Somehow VC is not receiving a heartbeat from the ESX hosts. To limit out hardware and network issues I am going to install ESX3.5 update1 and test it. The next test I am going to do is setup another network switch to ensure that it isn't a network issue. Anyway hope we get this resolved. Thanks for your support.
According to Windows Process Explorer, the VC process vpxd also uses the dll Crypt32.dll
Are there any errors related to its use in the windows servers logs? (application, system or security)
After carefully examining Application, Security, and System logs I do not see any errors with the crypt32.dll in my VC server's event logs. In addition since my logs are permanently archived I looked at all of the logs dating back to the time when the server was first put online and they too show nothing in regards to crypt32.dll in the event logs.Thanks for your support!!
No luck yet!! I have an open case that is outstanding for about a week and a half now. I haven't been shifted around too much, but the VMware folks are clueless so far. They haven't been helpful, but I know they are working on it. My belief at this point is SSL cert's or possibly a DB connectivity problem, maybe even both who knows. I can say that I am about to lose my patience very soon if I don't get any answers. Hopefully we can just keep working the problem it will be figured out, so hang in there.
No luck here, too
Case is open for 1,5 weeks now. Actual state of affairs:
"From the logs, the issue is matching VMware Problem Report#268505 which is currently unresolved"
Can't find the Problem Report, anyone knows where it is?
I just phones with VMware Support, and we tried a workaround:
- Deinstall "Update Manager" and "Converter for VirtualCenter"
- Reboot VC Server
I checked some actions after that and everything seems fine.
Now i wait 24h and I'll contact VMware again to get further instructions.
I'll post further improvements.
Hey thanks for the info. I have uninstalled "Converter" and "Update Manager" for VC and the client. I am rebooting the VC server. I will keep you posted with results. Good work!!
So far so good. That seems to be doing the trick, at least until VMware can work out the bugs. I am actually going to clone my existing VC server, then blow it away and start fresh. Hopefully I can find out were it all goes bad.
After nearly 24h everything is running fine, no errors at all.
I have installed the Update Manager on the Virtual Center Server again. Everything is still running fine, so far. I'll have a look at.
It seems to be the Converter, which is causing the problems.
Good work!! Everything is working perfectly at this point without "Update Manager" or "Converter". I am going reinstall "Update Manager" again and confirm. Once I have confirmed that everything is ok with "Update Manager" installed I will reinstall "Converter" and attempt a p2v conversion of a host and see what happens. Now that my memory serves me correct I recall doing a conversion on a physical host the night before the problem started to occur. Right before I went home for the day I started a conversion process of a host in VC and then left because It was going to take quit a bit of time. The next morning everything in VC was all horked up.
Another 24h later:
Everything is still running fine, including "Update Manager".
I'm not going to reinstall "Converter" now, cause I don't need it right now and I'm glad to have a stable VI again.
Reg is doing this right now (Thanks!), so he can give us a report.
I'm still in contact with VMware and I'll post any news.
I have reinstalled "Converter" and the "Converter Agent" on my VC host and have successfully duplicated the problem and at this point have confirmed that something in "Converter" is the cause. The work around is of course to remove converter and run it on another host.
Course of actions
Reinstalled converter and rebooted the host everything seemed to be fine, but I noticed a much slower starting and running "VMware Virtual Infrastructure Client" on my remote hosts. In addition VIC would start up sluggish on the VC server, however the hosts did not disconnect or show a status of (Not Responding), perhaps I didnt give it enough time.
Attempted a conversion of a Windows XP physical host to a VM while connected to the VC server through "Virtual Infrastructure Client" on my workstation. The conversion process failed. Soon after that the ESX hosts were showing a status of (not responding) at random intervals. Once again I removed "Converter" and the "Converter agent", rebooted my VC server and everything is working fine again.:D
Something in "Converter" is clobbering the communication between VC and the ESX hosts.
Thank you Reg for the test and the final confirmation, that Converter causes the problem.
I'll mark this thread as "question answered" now, cause we have a working workaround.
Now we have to wait for VMware to fix it.
If there are any news, I'll post them here.