Solved: Performance Problems with Multiple Cells (5.1.2)?

FGShepherdP10 · ‎05-29-2013

Whenever I'm using a single cell, I get great performance. But when I fire up my secondary cells, my performance falls off terribly (15 minutes to create a new vApp "shell" or to mount an ISO...Not great).

Has anyone else seen this behavior?

FGShepherdP10 · ‎06-19-2013

The firewall between the DMZ and Active Directory's time sources (in another city) wasn't open. Pretty simple (now).

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#Just for those working on configuring RHEL time on their cells, here's a distilled version of what we're doing, so hopefully, it will provide some guidance:

#Configuration of TIME

#Edit the ntp.conf file

#Put the following line at the very top of the ntpd.conf file:

tinker panic 0

Note: The directive tinker panic 0 must be at the top of the ntp.conf file. The configuration directive tinker panic 0 instructs NTP not to give up if it sees a large jump in time. This is important for coping with large time drifts and also resuming virtual machines from their suspended state.

#Comment out the 3 lines pointing to the "pool" servers and add the following:

server us.pool.ntp.org

#Update the ntpd.conf file to update NTP every 30 minutes:

echo '30 * * * * root /usr/sbin/ntpd -q -u ntp:ntp' > /etc/cron.d/ntpd

#Ensure ntp is set to start with the system

chkconfig ntpd on

#Note: VMware recommends you to use NTP instead of VMware Tools periodic time synchronization. NTP is an industry standard and ensures accurate timekeeping in your guest. You may have to open the firewall (UDP 123) to allow NTP traffic.

#Using RHEL as the OS for the Cells, update the following:

#Edit /etc/ntp/step-tickers:

us.pool.ntp.org

#After making changes to NTP configuration, the NTP daemon must be restarted. Refer to your operating system vendor’s documentation.

service ntpd restart

#Update ntp with the following command

ntpdate –u 0.north-america.pool.ntp.org

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#It is also important not to use the local clock as a time source, often referred to as the Undisciplined Local Clock. NTP has a tendency to fall back to this in preference to the remote servers when there is a large amount of time drift.

#An example of such a configuration is:

server 127.127.1.0
fudge 127.127.1.0 stratum 10

Comment out both lines.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Modified with content lifted directly from: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100642...

View solution in original post

cfor · ‎05-29-2013

A few things will cause this for sure - could be others but check these to start:

Firewall ports - make sure your cells can talk freely to each other, all ESX hosts, and all vCenters.

TimeSync - the cells need to be withing 2 seconds of each other on time or you will get some really bad slowdown on some operations.

DNS - make sure all ESX hosts, vCenters, and cells can resolve each other forward and reverse lookup via DNS.

Transfer space access - make sure /opt/vmware/vcloud-director/data/transfer is Owned by vcloud:vcloud on each cell and they point to the same shared storage. If one shows something off it will cause issues.

Hope this helps, if not please let us all know if you find the answer so we know what else to watch out for.

ChrisF (VCP4, VCP5, VCP-Cloud) - If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

FGShepherdP10 · ‎06-03-2013

Thanks, cfor.

I'll double-check the other suggestions (though I've done enough rounds with DNS that I'm pretty sure it's settled), but the transfer space is definitely an issue. Time may also be a factor, as they are in different clusters.

We have two cells in a DMZ for customer access and a third in a trusted Management VLAN/segment behind a firewall. The NFS/transfer server sits out in the DMZ with the other cells. When I'm watching the cell.log at the start/restart of the vmware-vcd service, I see a notification about something to the effect that "the management cell's use of the NFS transfer space cannot be verified." Maybe this is a bigger issue than I had realized.

Time to check the firewall ports again, I suppose...Is there a quick way to manually verify that the mount operation is happening on that management cell? (It's configured in init.d, but I'm not a master of NFS on RHEL enough to troubleshoot it.)

FGShepherdP10 · ‎06-04-2013

I've got sub-one-minute differences, but nowhere near sub-2-second. I used the same time sources and the following line for an initial sync:

/usr/sbin/ntpd -q -u ntp:ntp

but they're still not getting any closer to each other. Could I set them as primary sources for each other with secondary/tertiary being my time servers?

FGShepherdP10 · ‎06-19-2013

The firewall between the DMZ and Active Directory's time sources (in another city) wasn't open. Pretty simple (now).

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#Just for those working on configuring RHEL time on their cells, here's a distilled version of what we're doing, so hopefully, it will provide some guidance:

#Configuration of TIME

#Edit the ntp.conf file

#Put the following line at the very top of the ntpd.conf file:

tinker panic 0

Note: The directive tinker panic 0 must be at the top of the ntp.conf file. The configuration directive tinker panic 0 instructs NTP not to give up if it sees a large jump in time. This is important for coping with large time drifts and also resuming virtual machines from their suspended state.

#Comment out the 3 lines pointing to the "pool" servers and add the following:

server us.pool.ntp.org

#Update the ntpd.conf file to update NTP every 30 minutes:

echo '30 * * * * root /usr/sbin/ntpd -q -u ntp:ntp' > /etc/cron.d/ntpd

#Ensure ntp is set to start with the system

chkconfig ntpd on

#Note: VMware recommends you to use NTP instead of VMware Tools periodic time synchronization. NTP is an industry standard and ensures accurate timekeeping in your guest. You may have to open the firewall (UDP 123) to allow NTP traffic.

#Using RHEL as the OS for the Cells, update the following:

#Edit /etc/ntp/step-tickers:

us.pool.ntp.org

#After making changes to NTP configuration, the NTP daemon must be restarted. Refer to your operating system vendor’s documentation.

service ntpd restart

#Update ntp with the following command

ntpdate –u 0.north-america.pool.ntp.org

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#It is also important not to use the local clock as a time source, often referred to as the Undisciplined Local Clock. NTP has a tendency to fall back to this in preference to the remote servers when there is a large amount of time drift.

#An example of such a configuration is:

server 127.127.1.0
fudge 127.127.1.0 stratum 10

Comment out both lines.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

All

Performance Problems with Multiple Cells (5.1.2)?

Modified with content lifted directly from: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100642...

Modified with content lifted directly from: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100642...