Skip navigation

VMware AutoDeploy is a great way to manage a stateless ESXi environment. For more information on Autodeploy read the VMware docs located here.

 

Whilst Autodeploy is a great idea as the number of hosts in your infrastructure increase your AutoDeploy server may become a bottleneck that prevents your hosts from powering on in a reasonable timeframe. As the VMware docs state:

Simultaneously booting large numbers of hosts places a significant load on the Auto Deploy server. Because Auto Deploy is a web server at its core, you can use existing web server scaling technologies to help distribute the load. For example, one or more caching reverse proxies can be used with Auto Deploy to serve up the static files that make up the majority of an ESXi boot image. Configure the reverse proxy to cache static content and pass requests through to the Auto Deploy server.”

“After a massive power outage, VMware recommends that you bring up the hosts on a per-cluster basis. If you bring up multiple clusters simultaneously, the Auto Deploy server might experience CPU bottlenecks. All hosts come up after a potential delay. The bottleneck is less severe if you set up the reverse proxy.”


As I couldn't find any docs detailing how this could be achived I thought I would write something that may help others who need to scale out.

 

The basic permise is this:

 

Configure multiple tftpboot servers with custom tramp files pointing to different reverse caching proxies and then balance the requests to these hosts.

To do this I built multiple CentOS 6 hosts and configured each as tftp-server and Squid server and then configured round robin DNS to balance requests between these hosts.

 

TFTP-server


Install and configure a tftp server.

 

yum install tftp-server

#enable the tftp server

vi /etc/xinetd.d/tftp

# Edit the line disable = yes to

disable = no

#Restart xinetd

service xinetd restart

 

Copy the files from your current tftp server files to your tfp root usually /var/lib/tftpboot

 

undionly.kpxe.vmw-hardwired

tramp

 

Edit the tramp files as follows:

 

#!gpxe
set filename http://<proxy URL>:80/vmw/rbd/tramp

chain http://<proxy URL>:80/vmw/rbd/tramp

 

 

Squid


Install squid and then configure it as a reverse caching proxy as follows:

 

yum install squid

chkconfig squid on

 

Edit /etc/squid/squid.conf replacing the red text with the correct values for your infrastructure and only add the green section if you are using self-signed certs.

 

#define ACLs

acl manager proto cache_object

acl localhost src 127.0.0.1/32 ::1

acl to_localhost dst 127.0.0.0/8 0.0.0.0/32 ::1

acl vSphere_sites dstdomain <proxy URL>

acl localnet src <ESX management network sued to PXE boot e.g. 10.0.0.0/24>

acl Safe_ports port 80

acl CONNECT method CONNECT

#Configure access to web server

http_access allow manager localhost

http_access deny manager

http_access deny !Safe_ports

http_access allow localnet

http_access allow localhost

http_access allow vSphere_sites

http_access deny all

#Configure forwarding and caching to Autodeploy server

http_port 80 accel defaultsite=<proxy URL>

cache_peer <Autodeploy URL> parent 6501 0 no-query originserver ssl sslflags=DONT_VERIFY_PEER name=MyAccel

#Define cache access

cache_peer_access myAccel allow vSphere_sites

cache_peer_access myAccel deny all

coredump_dir /var/spool/squid

 

DHCP

 

Finally you need to update your DHCP config to serve multiple tftp servers with option 66. I have read some articles suggesting that this can be done in some DHCP servers however this is against RFC2132 and will most likely not work.

 

I decided to use round robin DNS instead and so configured option 66 with a host name that has an A record in DNS for each tftp server. This then balances requests between the servers. This can be done by adding the following:

 

next-server <autodeploy roundrobin DNS address>;

 

It is worth mentioning that if you are serious about resilience you would be better using an intelligent load balancer that can handle requests to the tftp servers.

 

Finally test the setup. Boot a number of hosts and monitor the access log for Squid on your proxies. You will start to see that elements of the image are loaded directly from squids cache removing load from the Autodeploy host.

 

861b5a21de2ecf075f - NONE/- text/html

1334833786.434    334 172.21.1.103 TCP_MEM_HIT/200 367334 GET http://<proxy IP>/vmw/cache/52/5d959364a6cd2e4609e471bad4f246/scsi-lpf.07f8f2635938dc247dd71cf757947ad6 - NONE/- text/html

1334833786.486     10 172.21.1.103 TCP_MEM_HIT/200 30402 GET http://<proxy IP>/vmw/cache/a7/67d0a90aec4fe0345daed522ef47db/scsi-meg.6717a7a3865d8a3775691dcc5b434a03 - NONE/- text/html

 

Thats it! You now have an AutoDeploy infrastructure capable of booting many hosts quickly without overloading the AutoDeploy service.

Hi,

 

I am using powershell healthcheck script for esxi cluster. The script report detects the Virtual Machine is running on the ESXi server(myesxservername) consuming more swap and MemoryBallonKB.The SQL server is installed on this server with 4GB RAM and Running under Vapp resources pool. The Vapp Resources pool memory resources reservation is configured 326MB and limit is unlimited. The Virtual Machine properties resources Allocation for memory is limit to 2GB.

 

I checked the ESXi server balloning usage in Vcenter on real time and usage is 1111652 KB, and for other ESXi servers in cluster is 0(Zero). As script detects the Memory Balloning usage is increasing because of this virtual machine.The ESXi server where Virtual Memory is hosted have not memory overcommitted.

 

Have any one ever face this problem. Please help.