I've had to reboot my View Secure gateway a couple of times now because the View Portal was inaccessible. I'm a Windows IIS guy, not an Apache guy, so I'm not even sure what to look for to prevent a reoccurence of whatever's happening. The server's in the DMZ and works normally most of the time. Because we're early in our View deployment, we've got less than a dozen users remotely accessing virtual desktops.
Any ideas?
Sorry for the delayed response daleallenc. I've been out of town.
We have not had a issue in over five business days. This is the longest period we have made it through. We had support look at the logs but they could not tell us any more info about the Java errors/warnings we saw. Adding the memory seems to have helped. Also, we are using two connection servers (brokers) and have half our users going to one and the other half going to an another. We have a a total of 42 conncurrent connections.
Hi jflisher,
Aside from the memory changes were there other changes you made while looking for a solution? We're having similar problems and have plenty of memory in our connection and security servers.
Thanks!
We have a very similar issue here at a customer site.
We have 3 Connection servers (view manager) on a round robin DNS each server has 2gb memory and only using 1gb tops.
we restarted the Composer service and attached that service to a Domain account.
Is this a .Net problem?
I can't speak for the others, but we've narrowed our troubleshooting down to the network adapter or the TCP/IP stack for it. Restarting the service on our Security server, a VM, never helped. When we did a Windows repair of the network connection, everything returned to normal.
we have a question mark over the network config here. The VM's and the ESX are fine its beyond there we have no control over.
Now trying to convince the customer to get a network company to evaluate their setup
In our environment we have about fifty virtual desktops connecting to two connection brokers (view servers). We did the following and have not had any issues in over thirty days.
1. Split the load so 25 machines talk to one server and 25 talk to the other.
2. Removed Microsoft NLB. This only complicated things. Forget Round Robin DNS.
3. Bumped up the RAM.
This solution obviously has a single point of failure if the dependent view server fails and the client needs to re-establish connection. You can put more than one view server in the view client drop down list. However, this requires a manually switch. In my opinion this should happen automatically and be transparent to the user. Hopefully we will see this in future releases.
John Flisher
Information Technology Manager
North Carolina State Ports Authority
From: krismcewan <communities-emailer@vmware.com>
To: <john_flisher@ncports.com>
Date: 05/19/2009 08:54 AM
we have doen a few tweaks too.
Connection Server service has a Domain login instead of the local service account,
Set reservations on resource pools
stale records for decomissioned AD servces removed from DNS.
Seems to have worked a treat.
We have 400 desktops to build up to over 14 esx hosts and 3 connection servers moving to 5 as the roll out happens. Round robin is needed in this instance. we are using Sunray servers as the broker as its sunray's that re the thin client.
I also found Nlite to be very usefull in creatign Gold XP images. Since Virtual desktops dont need all the drivers under the sun stripping down the xp footprint to 1gb as opposed to 3 has made a big improvement.
Chris
What happens in your Round Robin setup if one of connection servers fails and a client gets sent to that address?
John Flisher
Information Technology Manager
North Carolina State Ports Authority
From: krismcewan <communities-emailer@vmware.com>
To: <john_flisher@ncports.com>
Date: 05/19/2009 10:53 AM
it should fail over to the nex.
i mught try it later see what happens
Let me know how that goes.
The way I understand Round Robin, if the request goes to an IP address of a server that has failed, the client will not connect. If the request is initiated again on the client end, in your scenario it will have a 80 % chance of connecting to one of the remaing 4 out of 5 servers. 20 % failure rate is not acceptable in my environment while someone manually reconfigures DNS or gets the failed server back on line.
This is why I think this issues really needs to be brought to the attention of VMware by their customers. Fail over client connections to brokers should be built into View.
John Flisher
Information Technology Manager
North Carolina State Ports Authority
From: krismcewan <communities-emailer@vmware.com>
To: <john_flisher@ncports.com>
Date: 05/19/2009 11:44 AM
the reconfigyre isnt that bad. just remove the offendign server from the DNS file.
Unable to test it just now as the customer is gettign their blades reconfigured.
There is a few things that need polished up in View. the web interface is slow, the fact it only runs on IE, the bottle neck with 1 connection service.
Hopefully they will make huge improvements when integrating it into vCenter 4
VCP, VTSP4, VSP4, MCSE, MCTS, IBMBCE and anything else I can learn.
Hi Everyone,
We've gone a week now without a problem getting the login page to come up. We found that if a user hit enter 4 or 5 times rapidly after entering the URL, not only did the login page come up for them, it came up normally for all other users as well. That became our workaround although the problem would inevitably resurface several hours later.
After some back and forth with VMware, one of our engineers may have found a lasting solution (at least to our version of the problem).
The error in the VMWare View Connection Server logs was:
11:04:17,750 DEBUG <AJP-81> (Request1382) Connection marked as not reusable, closing.
11:04:30,437 DEBUG <SessionHandler> (E161C5E0D89B15A2A2C782B65309268D) hasSessionLostContact(): threshold = Wed May 13 11:03:30 EDT 2009, lastSeen = Wed May 13 11:04:29 EDT 2009
The workaround is to change the timeout value for the connection:
Go to C:\Program Files\VMware\VMware View\Server\broker\conf directory on the Connection Server.
Open the server.xml file there and change the following line:
<Connector port="8009" enableLookups="false" protocol="AJP/1.3" URIEncoding="UTF-8">
TO
<Connector port="8009" enableLookups="false" protocol="AJP/1.3" URIEncoding="UTF-8" connectionTimeout="900000"/>
You will need to restart the broker service once you have made the change.
This is an issue with some firewalls and their timeout values associated with the required connections through them between the Security server and the Connection server.
Hope this helps!
Workarounds
November 16th, 2009 [George Knerr|http://webalution.com/techshare/author/george-knerr/|Posts
by George Knerr] [Leave
a comment|http://webalution.com/techshare/2009/11/16/vmware-server-2-web-access-connection-loss-vmware-hostd-crash-workarounds/#respond] [Go
to comments|http://webalution.com/techshare/2009/11/16/vmware-server-2-web-access-connection-loss-vmware-hostd-crash-workarounds/#comments]
With upgrading to RHEL 5.4, CentOS 5.4 and Ubuntu 9.10, the latest
2.x.x versions of VMware Server are having serious Web Access GUI
connection failures, specifically vmware-hostd crashing repeatedly.
This has been found with VMware Server 2.0.0, Vmware Server 2.0.1 and
VMware Server 2.0.2. VMware Server 2.x.x was stable in the previous
revisions of the mentioned OS’s. Below are two solutions that “appear”
to make for a stable vmware-hostd process. You are advised strongly to
satisfy your own assuredness of the stability of vmware-hostd using
these solutions before deployment to a mission critical environment.
Both solutions do not require you to stop all vmware related
processes on the host server. The following steps assume vmware-hostd
has crashed and left VMware clients still running.
Note: If you get the below from the ps
command you have another issue and this document is not for you.
ps -ef |grep vmware-hostd
root 10858 1 0 16:47 ? 00:00:02
/usr/lib/vmware/bin/vmware-hostd -a -d -u /etc/vmware/hostd/config.xml
root 11055 11026 0 17:02 pts/3 00:00:00 grep vmware-hostd
If you want to start the vmware-hostd process to manage your VMware
Server 2 guest operating systems again you may do so with the following
commands.
export
LD_LIBRARY_PATH=/usr/lib/vmware/vmacore:/usr/lib/vmware/hostd:/usr/lib/vmware/lib/libxml2.so.2:/usr/lib/vmware/lib/libexpat.so.0:/usr/lib/vmware/lib/libstdc++.so.6:/usr/lib/vmware/lib/libgcc_s.so.1:/usr/lib/vmware/lib/libcrypto.so.0.9.8:/usr/lib/vmware/lib/libssl.so.0.9.8
/usr/lib/vmware/bin/vmware-hostd -a -d -u
/etc/vmware/hostd/config.xml &
<hit return/enter>
+ Done /usr/lib/vmware/bin/vmware-hostd -a
-d -u /etc/vmware/hostd/config.xml
ps -ef | grep hostd
root 11140 1 22 17:13 ? 00:00:01
/usr/lib/vmware/bin/vmware-hostd -a -d -u /etc/vmware/hostd/config.xml
root 11155 11026 0 17:13 pts/3 00:00:00 grep hostd&
nohup is not needed in this instance as vmware-hostd runs as a daemon
but the ampersand “&” is. Otherwise you’ll get logged output to
the screen and when you exit your session vmware-hostd will stop too.
I recommend looking at both solutions. I’m currently employing
solution #2 but I’ll leave that decision up to you. Both allow you to
use the start/stop /etc/init.d/vmware script as you normally would and
are permanent unlike the quick fix above to get the vmware-hostd process
up and running again. Again with both solutions you need to determine
if they, in fact, produce a stable VMware Server 2 environment before
deployment to a mission critical environment.
Download and copy libc-2.5.so into place:
lynx
http://mirror.centos.org/centos/5.3/os/x86_64/CentOS/glibc-2.5-34.x86_64.rpm
rpm -Uvh –root=/tmp/ –nodeps ./glibc-2.5-34.x86_64.rpm
mkdir /usr/lib/vmware/lib/libc.so.6
cp /tmp/lib64/libc-2.5.so /usr/lib/vmware/lib/libc.so.6/libc.so.6
Edit /usr/sbin/vmware-hostd adding the following export command just
before the last line in the script as follows:
tail -3 /usr/sbin/vmware-hostd
export
LD_LIBRARY_PATH=/usr/lib/vmware/lib/libc.so.6:$LD_LIBRARY_PATH
eval exec “$DEBUG_CMD” “$binary” “$@”
RHEL 5.4, CentOS 5.4 & Ubuntu 9.10 )
Here is another method not requiring reverting to an older version of
libc-2.5.so. The downside in this solution is it circumvents the
dynamic library path building of the /usr/sbin/vmware-hostd script and
executes the /usr/lib/vmware/bin/vmware-hostd binary directly. I do not
know if this will present problems in the future or not.
Below is the snippet from the modified /etc/init.d/vmware. You can
see I added a LD_LIBRARY_PATH statement, commented out the old exec call
and added a new one.
Start host agent
vmware_start_hostd() {
export
LD_LIBRARY_PATH=/usr/lib/vmware/vmacore:/usr/lib/vmware/hostd:/usr/lib/vmware/lib/libxml2.so.2:/usr/lib/vmware/lib/libexpat.so.0:/usr/lib/vmware/lib/libstdc++.so.6:/usr/lib/vmware/lib/libgcc_s.so.1:/usr/lib/vmware/lib/libcrypto.so.0.9.8:/usr/lib/vmware/lib/libssl.so.0.9.8
vmware_bg_exec “`vmware_product_name` Host Agent” \
“$vmdb_answer_LIBDIR/bin/vmware-hostd” -a -d -u
“$vmware_etc_dir/hostd/config.xml”
#”$vmdb_answer_SBINDIR/vmware-hostd” -a -d -u
“$vmware_etc_dir/hostd/config.xml”
}
If you don’t have critical guest OS’s running you can stop the guests
via the VMware Server 2 Web Access GUI and restart VMware:
/etc/init.d/vmware restart
Stopping VMware autostart virtual machines:
Stopping VMware management services:
VMware Virtual Infrastructure Web Access
Stopping VMware services:
VM communication interface socket family:
Virtual machine communication interface
Bridged networking on /dev/vmnet0
Host-only networking on /dev/vmnet1
Host-only networking on /dev/vmnet8
Starting VMware services:
Virtual machine communication interface
VM communication interface socket family:
Bridged networking on /dev/vmnet0
Host-only networking on /dev/vmnet1 (background)
Host-only networking on /dev/vmnet8 (background)
VMware Server Authentication Daemon (background)
Starting VMware management services:
VMware Server Host Agent (background)
VMware Virtual Infrastructure Web Access
Starting VMware autostart virtual machines:
As more information on this issue becomes available this post will be
updated. Please post your findings too.
This information was generated by my experimentation and the helpful
posts of the VMware Community, reference:
Categories: [Linux|http://webalution.com/techshare/category/linux/|View all
posts in Linux], [Vmware|http://webalution.com/techshare/category/vmware/|View all
posts in Vmware] Tags:
[Leave
a comment|http://webalution.com/techshare/2009/11/16/vmware-server-2-web-access-connection-loss-vmware-hostd-crash-workarounds/#respond] Trackback
Andrew
November 26th, 2009 at 07:31 | #1
|
Thanks or the great blog post. This issue has been driving me
nuts!
Shawn
December 9th, 2009 at 22:53 | #2
|
I am so happy Google found this post…I had been fighting this
issue for 3 days to no avail.
You rock!
Michal Rogozinski
December 28th, 2009 at 17:58 | #3
|
Thanks !!! I almost lost my hair because of that stupid issue!
That was a really helpful post.
Jouni Renfors
January 7th, 2010 at 06:35 | #4
|
Thanks man. I was ready to move to a different virtualization
solution when I couldn’t find the problem with VMWare.
Pander
January 28th, 2010 at 03:29 | #5
|
Thank you for info!
Downloaded and used:
yum downgrade glibc-2.5-34.el5_3.1.i686.rpm
glibc-common-2.5-34.el5_3.1.i686.rpm glibc-devel-2.5-34.el5_3.1.i686.rpm
glibc-headers-2.5-34.el5_3.1.i686.rpm
Then reboot.
Works good!
Just wanted to add to this...I've been fighting an issue with similar errors other users have posted in the View logs.
Our environment is Client>Lan>Firewall>View Security>Firewall>Lan>Connection Manager.
User "A" was able to connect to the external View Portal and authenticate successfully. The entitled desktops would display however the "Status" displayed "Not Connected". So the tunnel never appeared to be building correctly.
The users machine had multiple NIC's. Once I adjusted the provider order so that the active nic was at the top of the list - whala!..I was able to connect successfully. So - if you have multiple NIC's on your client machine. Check the provider order.
Hope this helps someone.