VMware Cloud Community
ThijsW
Contributor
Contributor

ESXi 5.1 - Connect to localhost failed: Connection failure

Hi,

One of mine ESXi 5.1 hosts is disconnected in vCenter. I'am unable to reconnect the host.

When restarting services.sh, i get on serveral services this error:

Connect to localhost failed: Connection failure

Running vpxa restart

Connect to localhost failed: Connection failure

Running sfcbd-watchdog restart

Connect to localhost failed: Connection failure

~ # tail -f var/log/vmkernel.log
2013-04-04T06:36:23.424Z cpu13:1211854)WARNING: Tcpip: 1304: socreate(type=1, proto=6) failed with error No buffer space available (55)
2013-04-04T06:36:23.424Z cpu13:1211854)WARNING: Tcpip: 1304: socreate(type=1, proto=6) failed with error No buffer space available (55)
2013-04-04T06:36:26.003Z cpu18:1211111)WARNING: Tcpip: 1304: socreate(type=2, proto=0) failed with error No buffer space available (55)
2013-04-04T06:36:26.014Z cpu18:1211111)WARNING: Tcpip: 1304: socreate(type=2, proto=0) failed with error No buffer space available (55)
2013-04-04T06:39:18.185Z cpu23:1212818)WARNING: UserLinux: 1331: unsupported: (void)

~ # tail -f /var/log/vpxa.log
2013-04-04T06:42:53.764Z [73C59B90 verbose 'hostdcnx'] [VpxaHalCnxHostagent] Creating temporary connect spec: localhost:443
2013-04-04T06:42:53.765Z [73C17B90 error 'HttpConnectionPool-000000'] [ConnectComplete] Connect failed to <cs p:0db52800, TCP:localhost:443>; cnx: (null), error: N7Vmacore15SystemExceptionE(Connection reset by peer)
2013-04-04T06:42:53.765Z [73C59B90 error 'httphttpUtil'] [HttpUtil::ExecuteRequest] Error in sending request - Connection reset by peer
2013-04-04T06:42:53.765Z [73C59B90 error 'hostdcnx'] [VpxaHalCnxHostagent] Failed to discover version: vim.fault.HttpFault
2013-04-04T06:42:53.765Z [73C59B90 warning 'hostdcnx'] [VpxaHalCnxHostagent] Could not resolve version for authenticating to host agent
2013-04-04T06:43:13.767Z [73C38B90 verbose 'hostdcnx'] [VpxaHalCnxHostagent] Creating temporary connect spec: localhost:443
2013-04-04T06:43:13.768Z [FFB99B90 error 'HttpConnectionPool-000000'] [ConnectComplete] Connect failed to <cs p:0db60350, TCP:localhost:443>; cnx: (null), error: N7Vmacore15SystemExceptionE(Connection reset by peer)
2013-04-04T06:43:13.768Z [73C38B90 error 'httphttpUtil'] [HttpUtil::ExecuteRequest] Error in sending request - Connection reset by peer

ESXi host is pingable on DNS and IP, also from other hosts.

Anyone know how i can solve this problem without a reboot?

Regards,

Thijs

Reply
0 Kudos
29 Replies
kellogstri
Contributor
Contributor

I have an ongoing case with Veeam and it appears that Veeam One could cause this problem.

My support case with them is Veeam Support - Case # 00503976

I have uninstalled Veeam One and very fearful to run the Backup and Replication too

Reply
0 Kudos
jasolution
Contributor
Contributor

Did you ever find a resolution to this or determine that the root cause was Veeam B&R for sure? I'm currently running Veeam B&R 7.x (latest Patch #4 build) and having this issue with one of our ESX 5.1 hosts. All hardware and drivers are verified on the HCL in our environment and it only seems to be affecting one of the hosts.

Jason

Reply
0 Kudos
kellogstri
Contributor
Contributor

Hi Jason,

My solution was to stop using Veeam and has stop since then and never had such problems occur.

I have been looking for VM replication solution which does not use snapshot on the esxi as well ... do you know any?

btw what is the datastore which you are using?

Reply
0 Kudos
jsc8041
Contributor
Contributor

We are having similar issues and too use Veeam One & VB&R.  I was wondering if anything resulted from your ticket with Veeam?

thanks,

Claude

Reply
0 Kudos
kellogstri
Contributor
Contributor

Hi Claude,

My resolution to this was stop using Veeam and it has been great!

Right now i am going to use Tintri replicate VM for my DR strategy.

May I know what is your storage?

Reply
0 Kudos
Sharantyr3
Enthusiast
Enthusiast

Hi there,

I'm digging up this thread (sorry for my english) because we had a major issue in our VI infrastructure that seems to be the same problem.

We are using veeam b&r and veeam one, and some VMs on a specific host and a specific datastore were completly locked.

Unable to poweron VMs, neither migrate them. I tried remove readd inventory VMs and host, no way.

vmkfstools -D gives no owner :

Lock [type 10c00001 offset 58503168 v 3975, hb offset 4112384

gen 521, mode 0, owner 00000000-00000000-0000-000000000000 mtime 5962377

num 0 gblnum 0 gblgen 0 gblbrk 0]

Addr <4, 101, 78>, gen 3917, links 1, type reg, flags 0, uid 0, gid 0, mode 600

len 0, nb 0 tbz 0, cow 0, newSinceEpoch 0, zla 4305, bs 8192

I had to remove from inventory VMs, create a new VM, move vmdk into that new VM's directory then add vmdk to the new VM and it works.

For guys like me who have the problem, you can get the actual owner of the file using rm (or maybe it displays it too when trying to vmotion).

When you try to rm the file, the esxi will timeout :

rm: can't remove 'HAProxy-01-ctk.vmdk': Resource temporarily unavailable

rm: can't remove 'HAProxy-01.vmx.lck': Resource temporarily unavailable

rm: can't remove 'HAProxy-01.vmx~': Resource temporarily unavailable

rm: can't remove 'vmware.log': Resource temporarily unavailable

BUT in the logs you will get the faulty ESXi :

2016-02-23T12:58:53.875Z cpu24:9937646)DLX: 4230: vol 'BJ-ISN-06', lock at 43160576: [Req mode: 1] Not free:

2016-02-23T12:58:53.875Z cpu24:9937646)[type 10c00005 offset 43160576 v 78794, hb offset 3624960

gen 1087, mode 1, owner 56b5cf5d-f208f93a-4755-3c4a926c279c mtime 171709

num 0 gblnum 0 gblgen 0 gblbrk 0]

2016-02-23T12:58:53.875Z cpu24:9937646)Res3: 5732: Rank violation threshold reached: cid 0xc1d00002, resType 4, cnum

2016-02-23T12:58:56.069Z cpu0:9937646)DLX: 3706: vol 'BJ-ISN-06', lock at 43160576: [Req mode 1] Checking liveness:

2016-02-23T12:58:56.069Z cpu0:9937646)[type 10c00005 offset 43160576 v 78794, hb offset 3624960

gen 1087, mode 1, owner 56b5cf5d-f208f93a-4755-3c4a926c279c mtime 171709

num 0 gblnum 0 gblgen 0 gblbrk 0]

Ok so it happens to have a faulty lock from time to time, but what was really absolutly WEIRD is the faulty ESXi having 3c4a926c279c as @MAC was in a DIFFERENT cluster off-site !!!

I was wondering why would this happen as this particular VM would never be run on this separate cluster, then I noticed that my VEEAM ONE server was running on this specific off site host !!!!

I'm using HP blades with latest SPP and latest VMware patches on esxi 5.1

So solution n°1 : annoying but recreate the VM and attach the old vmdk

solution n°2 : reboot esxi hosting veeam-one server. That may or may not works, I used solution n°1 then found n°2 ...

Hope this can help

Reply
0 Kudos
Sharantyr3
Enthusiast
Enthusiast

Well for some reason I can't edit my own post, so I add new infos here :

After vmotion the veeam-one server to another server, I could rm -rf the old VM directory and here is what I got in the vmkernel logs :

2016-02-23T13:09:43.259Z cpu26:5686842)DLX: 3706: vol 'BJ-ISN-06', lock at 43160576: [Req mode 1] Checking liveness:

2016-02-23T13:09:43.259Z cpu26:5686842)[type 10c00005 offset 43160576 v 78794, hb offset 3624960

gen 1087, mode 1, owner 56b5cf5d-f208f93a-4755-3c4a926c279c mtime 171709

num 0 gblnum 0 gblgen 0 gblbrk 0]

2016-02-23T13:09:43.260Z cpu26:5686842)DLX: 3321: Clearing wrong owner for lock at 43160576 with [HB state abcdef01 offset 3624960 gen 1088 stampUS 1480954430581 uuid 00000000-00000000-0000-000000000000 jrnl <FB 0> drv 14.58]

Reply
0 Kudos
k0eff
Contributor
Contributor

Hi,

I'm experiencing exactly same issue but I'm running on 6.0.

Here is described the issue and actually I have all the symptoms described here:

VMware Knowledge Base

but I can't confirm for log files. Currently host disconnected, but alive /on some way/.

and Yep  as some of view says, I also running veeam backup and replication with veeam ONE.

So is it possible issue still be alive?!

Reply
0 Kudos
sarikrizvi
Enthusiast
Enthusiast

Try below cmds to check/start hostd service

/etc/init.d/hostd status >>>> /etc/init.d/hostd start  

Check hostd status , if it's not started then check /var/log/vmkernel.log , /var/log/hostd.log and /var/log/vpxa.log

When esxcli cmd not working you can run localcli cmd instead of esxcli .

# localcli network firewall get

# localcli network firewall set --enable false

# localcli system maintenanceMode set --enable true

# localcli vm process list

# localcli vm process kill -w <World ID> -t soft     (Shutdown VMs)

Regards,
SARIK (Infrastructure Architect)
vExpert 2018-2020 | vExpert - Pro | NSX | Security
vCAP-DCD 6.5 | vCP-DCV 5.0 | 5.5 | 6.0 | vCA-DCV 5 | vCA-Cloud 5 | RHCSA & RHCE 6 | A+ (HW & NW)
__________________
Please Mark "Helpful" or "Correct" if It'll help you
_____________________________________
@Follow:
Blog# https://vmwarevtech.com
vExpert# https://vexpert.vmware.com/directory/1997
Badge# https://www.youracclaim.com/users/sarik
Reply
0 Kudos
sarikrizvi
Enthusiast
Enthusiast

Try below cmds to check/start hostd service

/etc/init.d/hostd status >>>> /etc/init.d/hostd start  

Check hostd status , if it's not started then check /var/log/vmkernel.log , /var/log/hostd.log and /var/log/vpxa.log

When esxcli cmd not working you can run localcli cmd instead of esxcli .

# localcli network firewall get

# localcli network firewall set --enable false

# localcli system maintenanceMode set --enable true

# localcli vm process list

# localcli vm process kill -w <World ID> -t soft     (Shutdown VMs)

Regards,
SARIK (Infrastructure Architect)
vExpert 2018-2020 | vExpert - Pro | NSX | Security
vCAP-DCD 6.5 | vCP-DCV 5.0 | 5.5 | 6.0 | vCA-DCV 5 | vCA-Cloud 5 | RHCSA & RHCE 6 | A+ (HW & NW)
__________________
Please Mark "Helpful" or "Correct" if It'll help you
_____________________________________
@Follow:
Blog# https://vmwarevtech.com
vExpert# https://vexpert.vmware.com/directory/1997
Badge# https://www.youracclaim.com/users/sarik
Reply
0 Kudos