VMware Cloud Community
tman24
Enthusiast
Enthusiast

Keep losing remote management. What gives?

I'm getting pretty frustrated by this now, so I'll explain the problem.

I have a few ESXi 3.5 servers running on approved hardware (Dell PowerEdge 2950's to be precise). Generally they run ok, but recently (and by this I mean for the last couple of months) remote management keeps breaking. VIC will complain that it can't connect. If I try the 'unsupported' ALT-F1 access from the console, I can type 'unsupported' and get the red banner, but then sometimes I can get the logon prompt, other times it just sits there and does nothing. If I can logon, sometimes I get the message 'cannot fork' and I can't do anything. If I get past this point, I can generally run '/sbin/services.sh restart' to get remote access back, but it's hit and miss. Remote SSH access also won't work as the ESXi host won't accept connections (even though SSH is enabled). At this stage, my only option is to reboot the host. Odd thing is, generally during all this, the VM's are running fine, it's just the remote admin that totally breaks, and restarting the management network from the console makes no difference.I often see this (repeating) at the console as well;

killed

stopping sfcbd

starting sfcbd

At the moment 3 of my 5 ESXi servers are in this state, and I'll reboot them, then they'll be fine, only to repeat the process a week or two later. They are all running U4 build 153875

Surely this is too much of a coincidence!

Any help appreciated.

0 Kudos
7 Replies
espi3030
Expert
Expert

there be an IP conflict? Maybe another device on your net work is taking your ESX IP's? Happened to me once.

0 Kudos
DSTAVERT
Immortal
Immortal

At least someone suggested that disabling the ssh access helped.

-- David -- VMware Communities Moderator
0 Kudos
mcowger
Immortal
Immortal

"'cannot fork' "

^^ that message is key. Your console OS is running out of memory. Up the amount of memroty assigned to the COS/vmKernel, or work with VMware to determine why its running out of memory.






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
0 Kudos
tman24
Enthusiast
Enthusiast

Thanks for the suggestions. I can confirm there is no IP address conflict. The ESXi host is statically assigned, and I control the assignment. I also don't get the 'cannot fork' all the time, but I have seen it when trying to access the console via ALT-F1 (usually a good sign all remote management has been lost). The servers in question are dual quad-core with 16GB RAM. Resources show neither server is under any particular load CPU or memory wise, so would it still be possible to run out of resources? SSH is enabled on all the servers, but it's been like that since day one, and these servers have been online for over a year now. It's only in the last couple of months that the problem has been happening regularly.

Could this be related to a patch? I try to keep the servers up to date. I know they're running a little behind at the moment, and I do plan to bring them up to date sometime soon. I just can't understand why so many of the ESXi hosts would have this problem!

0 Kudos
Josh26
Virtuoso
Virtuoso

Did you at any point follow some of the "helpful" guides out there suggesting you reduce the amount of memory that is allocated to the console OS?

0 Kudos
tman24
Enthusiast
Enthusiast

Regarding the suggestion from Josh26, I have made no changes to the memory allocated to the console. The only thing I did post install was to enable SSH via the documented method.

0 Kudos
mcowger
Immortal
Immortal

The COS gets different memory than the server as a whole.

I can 100% guarantee you this is a memory within your COS problem....you need to open a ticket with VMware to have them figure out where your memory is going. vfork errors like that are exclusively memory exhaustion problems.






--Matt

VCP, vExpert, Unix Geek

--Matt VCDX #52 blog.cowger.us
0 Kudos