I have just rolled out vCloud 5.5 and vSphere 5.5 and have struck an issue that re-occurs every 13 days after a physical host reboot.
When trying to connect to a VM console using either [vCD/Web Client/c# Client] I get the following errors:
1. Blank Screen
2. Occasionally MKS malformed response from server
3. Unable to connect to the MKS: Connection terminated by server
I have tried Migrating the VM to another host made no difference, restarting the management agents on the host made no difference, Rebooting the host
and migrating the VM back to host fixed the issue. Suspect a bug in vSphere 5.5 as all [3] host had identical problems. So far I have struck this issue twice and
my servers have been up for 7 days and expect the 3rd occurrence to occur 13/11/2013 ;(
Issue started 16/10 08:30am [first noticed]
Server uptime all [3] hosts 13 days
ESXi build 5.5.0 1331820
Issue re-occurred 29/10 15:00 [second time]
Server uptime all [3] hosts 13 days
ESXi build 5.5.0 1331820
Below is Syslog Server Capture when attempting a VM console connection.
<166>2013-10-15T23:08:39.722Z fcsesx01.fred.local Hostd: -->
<166>2013-10-15T23:08:39.722Z fcsesx01.fred.local Hostd: [3A160B70 verbose 'Hostsvc.ResourcePool pool18'] Added child 26 to pool
<166>2013-10-15T23:08:39.723Z fcsesx01.fred.local Hostd: [3A8D2B70 info 'Libs'] CnxAuthdConnect: Returning false because CnxAuthdProtoReadResponse2 failed
<166>2013-10-15T23:08:39.723Z fcsesx01.fred.local Hostd: [3A8D2B70 info 'Libs'] CnxConnectAuthd: Returning false because CnxAuthdConnect failed
<166>2013-10-15T23:08:39.723Z fcsesx01.fred.local Hostd: [3A8D2B70 info 'Libs'] Cnx_Connect: Returning false because CnxConnectAuthd failed
<166>2013-10-15T23:08:39.723Z fcsesx01.fred.local Hostd: [3A8D2B70 info 'Libs'] Cnx_Connect: Error message: Connection terminated by server
<166>2013-10-15T23:08:39.723Z fcsesx01.fred.local Hostd: [FFE40B70 info 'Vmsvc.vm:/vmfs/volumes/5248c6f0-6a442ffe-c35f-0017a47724d8/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980)/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980).vmx'] Foundry_[Create|Open]Ex failed: Error: (3008) Cannot connect to the virtual machine
<166>2013-10-15T23:08:39.723Z fcsesx01.fred.local Hostd: [FFE40B70 info 'Vmsvc.vm:/vmfs/volumes/5248c6f0-6a442ffe-c35f-0017a47724d8/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980)/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980).vmx'] Failed to load virtual machine.
<166>2013-10-15T23:08:39.723Z fcsesx01.fred.local Hostd: [FFE40B70 info 'Vmsvc.vm:/vmfs/volumes/5248c6f0-6a442ffe-c35f-0017a47724d8/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980)/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980).vmx'] Failed to load VM N5Vmomi5Fault11SystemError9ExceptionE(vmodl.fault.SystemError)
<166>2013-10-15T23:08:39.723Z fcsesx01.fred.local Hostd: [FFE40B70 info 'Vmsvc.vm:/vmfs/volumes/5248c6f0-6a442ffe-c35f-0017a47724d8/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980)/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980).vmx'] Marking VirtualMachine invalid
<166>2013-10-15T23:08:39.724Z fcsesx01.fred.local Hostd: [FFE40B70 info 'Vmsvc.vm:/vmfs/volumes/5248c6f0-6a442ffe-c35f-0017a47724d8/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980)/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980).vmx'] State Transition (VM_STATE_INITIALIZING -> VM_STATE_INVALID_CONFIG)
<166>2013-10-15T23:08:39.724Z fcsesx01.fred.local Hostd: [FFE40B70 verbose 'Vmsvc.vm:/vmfs/volumes/5248c6f0-6a442ffe-c35f-0017a47724d8/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980)/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980).vmx'] Time to load virtual machine: 51 (msecs)
<166>2013-10-15T23:08:39.724Z fcsesx01.fred.local Hostd: [FFE40B70 info 'Vmsvc'] Loaded virtual machine: /vmfs/volumes/5248c6f0-6a442ffe-c35f-0017a47724d8/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980)/diskload (a33d96fa-47c9-4135-b05e-fd39ce56d980).vmx
<166>2013-10-15T23:08:39.724Z fcsesx01.fred.local Hostd: -->
Any help would be appreciated.
I have also logged the call with HP as we have software support with them and they mentioned that a number of sites have similar issues and a call has been place with VMware for further investigation
Rgds
Mark Askham
Senior Infrastructure Architect
It should be called hpHelper, living under /opt/hp/hp-ams.
/etc/init.d/hp-ams.sh stop
To persist the change in the event of a host reboot, run the command:
chkconfig hp-ams.sh off
Also performed the commands that Gaurav_Baghla mentioned.
It only seems to affect our BL460c Gen8 hosts, BL490 G7 hosts have an uptime of 15 days, but don't seem affected.
We're also running vCloud and this issue affects vShield too, resulting in labs without network connectivity.
We're on the latest vmware patch, that doesn't solve this issue.
HP, fix this!
I seem to have had a more severe crash of one of my hosts today. I'm unable to SSH to the server (get a "connection refused" error), can't log into the direct console (just hangs without bringing up prompt), and can't vMotion any VM's off the host (get an error, "A general system error occurred.")
The VMKernel log is filled with errors like this:
2014-01-28T16:04:49.874Z cpu16:3061156)WARNING: VisorFSObj: 1940: Cannot create file /var/run/sfcb/52c0b47e-eb5b-d2ce-1bde-8f8a13ed2f13 for process sfcb-CIMXML-Pro because the inode table of its ramdisk (root) is full.
Anyone have any ideas how to stop the hp service without bringing down the host? Of course there are several high profile virtual machines on it that can't incur any downtime...
I have a ticket open with VMWare support but their response time is proving horrible, over 24 hours so far.
Steve
Anyone know if you can use PowerCLI to stop the HP AMS service somehow? For some reason, I can connect to the host using PowerCLI.
Hello. I have the same problem (black\blank console in Vsphere client). In my case, were problems with DNS.
Try this: log in to the vSphere client against IP
Good luck!
This is definitely not a DNS issue.
i guess. this is not a DNS issue. stephenrbarry is right.
I have HP BL460G8 blade Server. and HP DL360 G7, Dell R710, Dell R410.
All Server Installed ESXi 5.5 and vCenter 5.5 Virtual Appliance.
But Only problem was BL460 G8 Model.
Is anyone hearing any updates from HP or VMware regarding this?
And yes, while some users may be experiencing similar issues and can resolve them different ways - this specific issue appears to only impact HP Gen8 blades running HP's custom ISO.
HP has finally confirmed to me that they were able to reproduce the error in their lab, but they haven't been able to isolate "why" yet.
Here's a recap of what I know:
On one particular server, I had stopped and restarted the hp-ams.sh service after it had bugged, but the issue never came back until a reboot (+ the two week delay). This could just be coincidental, as I only did it on one host.
Corrective actions:
Cheers
-Joshua
I removed the hp-ams service before my last reboot and today is day 14 and the issue has not come back.
Hp told me to run the following commands from the console:
esxcli software vib list
(You will see the hp-ams service listed)
esxcli software vib remove -n hp-ams
(Will remove the service and the system will then require a reboot)
HP support are due to come back to me tomorrow to check if that has worked, they will insist that this is a VMware issue and that I need to log the problem with them. I guess that is the next step but fear I am going to get stuck in between HP and VMware support who will end up blaming each other.
I will keep you all updated if I get any more information from either
HP just released an AMS update 9.5.0 (18 Feb 2014) along with other driver updates. I'll give it a go this week.
The following fix has been made to HP Agentless Management Service:
I just looked at my downloads and saw that HP released an updated customized ISO today. Looks like this could finally be the solutions we've been waiting for...
Don't ever let HP give you the runaround. If you have purchased your support from HP they are obligated to provide you with a solution. They have their own internal contacts with VMware and can work with them to find a solution for you. My reseller taught me that I am sometimes too polite. Let them know you are dissatisfied with their level of customer service and that you would like to speak with their supervisor. There is no need to be caught in the middle.
My environment started acting similar after upgrading to vSphere 5.5. But I don't get the MKS error that often, I just get a black screen on the console tab. We have all DELL hardware, no HP whatsoever. I can be working fine on a VM in the console tab, then click on another VM, click back and it's blank and never comes back. The only way to get it back is to right-click and "Open Console" then the console in the tab will reappear again. Very annoying.
15 days ago, I updated my Gen8 hosts with AMS v9.5.0-15. The problem has not come back.
Hey BigBjorn/joshswain/whomever, are you still having no issues with 9.5.0? My test instance has just reached 14 days, so I'm still a bit early to confirm if it's resolved.
Thanks
-Joshua
Everything still works for me.
No problems here on hosts at the 14 day mark since the last firmware flash and driver updates to match the 2014.02.0 SPP.
Hello Guys and Ladies,
I have VMware vSphere running on IBM ThnkServer RD640. And every day i have to restart the actual host server because am not able to connect to my guest web server or any other server on the host system.
