Hi,
One of mine ESXi 5.1 hosts is disconnected in vCenter. I'am unable to reconnect the host.
When restarting services.sh, i get on serveral services this error:
Connect to localhost failed: Connection failure
Running vpxa restart
Connect to localhost failed: Connection failure
Running sfcbd-watchdog restart
Connect to localhost failed: Connection failure
~ # tail -f var/log/vmkernel.log
2013-04-04T06:36:23.424Z cpu13:1211854)WARNING: Tcpip: 1304: socreate(type=1, proto=6) failed with error No buffer space available (55)
2013-04-04T06:36:23.424Z cpu13:1211854)WARNING: Tcpip: 1304: socreate(type=1, proto=6) failed with error No buffer space available (55)
2013-04-04T06:36:26.003Z cpu18:1211111)WARNING: Tcpip: 1304: socreate(type=2, proto=0) failed with error No buffer space available (55)
2013-04-04T06:36:26.014Z cpu18:1211111)WARNING: Tcpip: 1304: socreate(type=2, proto=0) failed with error No buffer space available (55)
2013-04-04T06:39:18.185Z cpu23:1212818)WARNING: UserLinux: 1331: unsupported: (void)
~ # tail -f /var/log/vpxa.log
2013-04-04T06:42:53.764Z [73C59B90 verbose 'hostdcnx'] [VpxaHalCnxHostagent] Creating temporary connect spec: localhost:443
2013-04-04T06:42:53.765Z [73C17B90 error 'HttpConnectionPool-000000'] [ConnectComplete] Connect failed to <cs p:0db52800, TCP:localhost:443>; cnx: (null), error: N7Vmacore15SystemExceptionE(Connection reset by peer)
2013-04-04T06:42:53.765Z [73C59B90 error 'httphttpUtil'] [HttpUtil::ExecuteRequest] Error in sending request - Connection reset by peer
2013-04-04T06:42:53.765Z [73C59B90 error 'hostdcnx'] [VpxaHalCnxHostagent] Failed to discover version: vim.fault.HttpFault
2013-04-04T06:42:53.765Z [73C59B90 warning 'hostdcnx'] [VpxaHalCnxHostagent] Could not resolve version for authenticating to host agent
2013-04-04T06:43:13.767Z [73C38B90 verbose 'hostdcnx'] [VpxaHalCnxHostagent] Creating temporary connect spec: localhost:443
2013-04-04T06:43:13.768Z [FFB99B90 error 'HttpConnectionPool-000000'] [ConnectComplete] Connect failed to <cs p:0db60350, TCP:localhost:443>; cnx: (null), error: N7Vmacore15SystemExceptionE(Connection reset by peer)
2013-04-04T06:43:13.768Z [73C38B90 error 'httphttpUtil'] [HttpUtil::ExecuteRequest] Error in sending request - Connection reset by peer
ESXi host is pingable on DNS and IP, also from other hosts.
Anyone know how i can solve this problem without a reboot?
Regards,
Thijs
I have an ongoing case with Veeam and it appears that Veeam One could cause this problem.
My support case with them is Veeam Support - Case # 00503976
I have uninstalled Veeam One and very fearful to run the Backup and Replication too
Did you ever find a resolution to this or determine that the root cause was Veeam B&R for sure? I'm currently running Veeam B&R 7.x (latest Patch #4 build) and having this issue with one of our ESX 5.1 hosts. All hardware and drivers are verified on the HCL in our environment and it only seems to be affecting one of the hosts.
Jason
Hi Jason,
My solution was to stop using Veeam and has stop since then and never had such problems occur.
I have been looking for VM replication solution which does not use snapshot on the esxi as well ... do you know any?
btw what is the datastore which you are using?
We are having similar issues and too use Veeam One & VB&R. I was wondering if anything resulted from your ticket with Veeam?
thanks,
Claude
Hi Claude,
My resolution to this was stop using Veeam and it has been great!
Right now i am going to use Tintri replicate VM for my DR strategy.
May I know what is your storage?
Hi there,
I'm digging up this thread (sorry for my english) because we had a major issue in our VI infrastructure that seems to be the same problem.
We are using veeam b&r and veeam one, and some VMs on a specific host and a specific datastore were completly locked.
Unable to poweron VMs, neither migrate them. I tried remove readd inventory VMs and host, no way.
vmkfstools -D gives no owner :
Lock [type 10c00001 offset 58503168 v 3975, hb offset 4112384
gen 521, mode 0, owner 00000000-00000000-0000-000000000000 mtime 5962377
num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr <4, 101, 78>, gen 3917, links 1, type reg, flags 0, uid 0, gid 0, mode 600
len 0, nb 0 tbz 0, cow 0, newSinceEpoch 0, zla 4305, bs 8192
I had to remove from inventory VMs, create a new VM, move vmdk into that new VM's directory then add vmdk to the new VM and it works.
For guys like me who have the problem, you can get the actual owner of the file using rm (or maybe it displays it too when trying to vmotion).
When you try to rm the file, the esxi will timeout :
rm: can't remove 'HAProxy-01-ctk.vmdk': Resource temporarily unavailable
rm: can't remove 'HAProxy-01.vmx.lck': Resource temporarily unavailable
rm: can't remove 'HAProxy-01.vmx~': Resource temporarily unavailable
rm: can't remove 'vmware.log': Resource temporarily unavailable
BUT in the logs you will get the faulty ESXi :
2016-02-23T12:58:53.875Z cpu24:9937646)DLX: 4230: vol 'BJ-ISN-06', lock at 43160576: [Req mode: 1] Not free:
2016-02-23T12:58:53.875Z cpu24:9937646)[type 10c00005 offset 43160576 v 78794, hb offset 3624960
gen 1087, mode 1, owner 56b5cf5d-f208f93a-4755-3c4a926c279c mtime 171709
num 0 gblnum 0 gblgen 0 gblbrk 0]
2016-02-23T12:58:53.875Z cpu24:9937646)Res3: 5732: Rank violation threshold reached: cid 0xc1d00002, resType 4, cnum
2016-02-23T12:58:56.069Z cpu0:9937646)DLX: 3706: vol 'BJ-ISN-06', lock at 43160576: [Req mode 1] Checking liveness:
2016-02-23T12:58:56.069Z cpu0:9937646)[type 10c00005 offset 43160576 v 78794, hb offset 3624960
gen 1087, mode 1, owner 56b5cf5d-f208f93a-4755-3c4a926c279c mtime 171709
num 0 gblnum 0 gblgen 0 gblbrk 0]
Ok so it happens to have a faulty lock from time to time, but what was really absolutly WEIRD is the faulty ESXi having 3c4a926c279c as @MAC was in a DIFFERENT cluster off-site !!!
I was wondering why would this happen as this particular VM would never be run on this separate cluster, then I noticed that my VEEAM ONE server was running on this specific off site host !!!!
I'm using HP blades with latest SPP and latest VMware patches on esxi 5.1
So solution n°1 : annoying but recreate the VM and attach the old vmdk
solution n°2 : reboot esxi hosting veeam-one server. That may or may not works, I used solution n°1 then found n°2 ...
Hope this can help
Well for some reason I can't edit my own post, so I add new infos here :
After vmotion the veeam-one server to another server, I could rm -rf the old VM directory and here is what I got in the vmkernel logs :
2016-02-23T13:09:43.259Z cpu26:5686842)DLX: 3706: vol 'BJ-ISN-06', lock at 43160576: [Req mode 1] Checking liveness:
2016-02-23T13:09:43.259Z cpu26:5686842)[type 10c00005 offset 43160576 v 78794, hb offset 3624960
gen 1087, mode 1, owner 56b5cf5d-f208f93a-4755-3c4a926c279c mtime 171709
num 0 gblnum 0 gblgen 0 gblbrk 0]
2016-02-23T13:09:43.260Z cpu26:5686842)DLX: 3321: Clearing wrong owner for lock at 43160576 with [HB state abcdef01 offset 3624960 gen 1088 stampUS 1480954430581 uuid 00000000-00000000-0000-000000000000 jrnl <FB 0> drv 14.58]
Hi,
I'm experiencing exactly same issue but I'm running on 6.0.
Here is described the issue and actually I have all the symptoms described here:
but I can't confirm for log files. Currently host disconnected, but alive /on some way/.
and Yep as some of view says, I also running veeam backup and replication with veeam ONE.
So is it possible issue still be alive?!
Try below cmds to check/start hostd service
/etc/init.d/hostd status >>>> /etc/init.d/hostd start
Check hostd status , if it's not started then check /var/log/vmkernel.log , /var/log/hostd.log and /var/log/vpxa.log
When esxcli cmd not working you can run localcli cmd instead of esxcli .
# localcli network firewall get
# localcli network firewall set --enable false
# localcli system maintenanceMode set --enable true
# localcli vm process list
# localcli vm process kill -w <World ID> -t soft (Shutdown VMs)
Try below cmds to check/start hostd service
/etc/init.d/hostd status >>>> /etc/init.d/hostd start
Check hostd status , if it's not started then check /var/log/vmkernel.log , /var/log/hostd.log and /var/log/vpxa.log
When esxcli cmd not working you can run localcli cmd instead of esxcli .
# localcli network firewall get
# localcli network firewall set --enable false
# localcli system maintenanceMode set --enable true
# localcli vm process list
# localcli vm process kill -w <World ID> -t soft (Shutdown VMs)