We have an automated pool of 150 desktops which get refreshed on logoff OR after being disconnected for >60 minutes. I've noticed that we have been gradually getting more and more "Agent Unreachable" status's for our VDI desktops over a period of 4 - 6 hours until I go in, remove the VM's from disk and let provisioning re-provision the VM's to reset them. Doing a desktop reset does not fix the issue. I have made sure that the VM's are receiving IP addresses from DHCP and they are. We are not running out of addresses since these 150 desktops have 250 IP addresses in the DHCP scope with 15 minute lease times.
We are running VMware View 5.2. If any more information is needed, just ask.
Can you try running the support tool on one of the desktops that is reported as "agent unreachable" and share the output? VMware KB: Collecting diagnostic information for VMware View That should run some basic connectivity tests and gather up relevant log files.
Turns out they aren't getting DHCP addresses after all. I checked both the dvswitch and our DHCP server and both are only at 60 - 70% capacity currently and maxing out at 75%.
While running the support tool I realized that the network connections were had the little caution flag on them (Windows 7 desktops). So I guess the new question is how can I make sure that these machines always get a DHCP address when they get provisioned?
Edit: Although I don't think this is a VMware issue now, do you have any thoughts as to what I could do to make sure they are getting DHCP addresses?
The first thing I would check is whether explicitly running ipconfig /renew [adapter] always succeeds on your broken VMs. Check to see if it's timing related, e.g. if you do an explicit renew on a VM, refresh it, does problem always reproduce? Also try doing an explicit ipconfig /release on the template before shutting it down for the snapshot operation and see if that makes any difference.
The release on the template VM did not fix the issue. We are still experiencing the issue in two pools with the release and another 3 without the release in the templates. Both pools are experience the same issues and symptoms.
We checked our DNS servers (Our DNS servers are DHCP servers) and all we see is the IP lease expiring after it's lease time is up. We don't see any chatter at the halfway point. The IP lease is 1 hour and a normal client would have chatter like Request DHCP -> DHCP Reply -> Accept DHCP Lease... 30 minutes later request renewal and repeat until the client gets shutdown for > 1 hour. Right now what we are seeing is Client comes up once goes through that, then gets refreshed. After this refresh, we see "Agent Unreachable" in view, and then in DNS we would see the lease expire. No chatter for renewal or anything because on refresh it is pulling an APIPA address.
Is your DHCP serving requests to other clients in your organization ? May be it is a problem at DHCP side and nothing to do with View/ Desktop.
@VCPGUY yes it is, and it has been working find long before VDI entered the environment, but see below.
So I found the issue later today. The problem was that in our view composer settings we have it set to NOT use the same computer accounts, because we were under the impression that quickprep was supposed to maintain common SID's with it's desktops. Apparently not so. We looked at our Windows DNS records and found that a lot of the records in there for our VDI desktops had dead SID's for it's permissions instead of the computer account. Because of this, the VDI desktops were not able to update their DNS records and therefore, failed to receive DHCP addresses.
So, our plan of action is to enable the "Allow reuse of computer accounts," Delete ALL of our VDI DNS records in our Windows DNS servers, and recompose the pools
We tested it with 2 of our 6 pools with flush and release before the snap, and even adding a login script to do ipconfig /registerdns in order to force a DNS registration process, and it still wouldn't since the Computer name still exists in DNS but the SID on the DNS entry is dead, so the computer account trying to update the DNS record has no permissions to do so.
our DHCP scope options ticked are the following:
enable DNS dynamically updates according... (ticked
always dynamically update DNS A and PTR records (ticked)
Discard A and PTR records when lease is deleted (Ticked)
Dynamically update DNS A and PTER records for DHCP (Ticked)
your issue sounds like this:
DNS issues give a real pain in VDI environments if not configured properly.
Here is what you need to do to remain trouble free.
- Create a new AD service account for DHCP DYNDNS updates.
- Add the user and the DHCP Server computername to a group in AD called 'DNSUpdateProxy'
- Configure the the Advanced properties of IPV4 on your DHCP console and set the above credentials.
- Delete all records in DNS which were created earlier for VDI clients. Dont worry! these will be automatically created once the DNS client sends a re-regiatration request which is typically a few minutes time. You can run ipconfig /registerdns from client to do the same but it's not practical with 100's of VDI desktops so leave it to the system.
This will send a the dyn update by a user who has access to do so. By default, this privilege is to the DHCP Server during initial lease. The server is unable to dynamically update the DNS records as they change so quickly in VDI enviroments.
I have tried this approach and still see the dead SID as the owner of the record
I have deleted all records, i noticed that the PTR record is owned by the service account but not the A Record
I am wondering should the A Record owner be the computer account or the service account
Had the same issue with Full Clones along with other symptoms. Some of the sessions were unable to connect to certain websites and on other sessions the Guest NIC would not remain enabled. On each of the hosts, we increased the "Number of ports" on the Guest subnet vSphere Properties for the default of 120 to 2040 (you environment may dictate another value). Once the change was made (requires a recycle of each host), the issues resolved themselves.