5 Replies Latest reply: Jan 29, 2009 8:12 AM by kevinrs5855 RSS

    Support for Moving Servers / Services between Platforms w/oHistory Loss

    kevinrs5855 Hot Shot
      For Software as a Service Environments having the ability to have Hyperic migrate or move services and or servers from one Platform to another is important.  Losing the history of performance data being gathered for a Service, like a custom JMX plugin is a real pain every time the service is moved from one server to another.  We are always balancing the load on our platforms by balancing larger or more heavily used services and or servers across platforms.  Also, when we do move a service or a server then the old service or server sticks around and causes issues with with the HQ agent reporting the failure and or showing up as being down due to the large amount of failures in collecting metrics.

      So, my request is for Hyperic to add support for not only re-naming resources but also allow servers/services to be moved from one platform to another platform without losing history.  And to allow for services to be moved from one server to another server.  Webservice or Groovy API support would be great as well, so that hyperic could integrate with automatic provisioning systems better.

      Message was edited by: Kevin.Schmidt@icims.com
        • 1. Re: Support for Moving Servers / Services between Platforms w/oHistory Loss
          excowboy Master
          Hi Kevin,

          thanks for your feature request. As a workaround could you try to use Compatible Groups to reflect your balancing ?

          Cheers,
          Mirko
          • 2. Re: Support for Moving Servers / Services between Platforms w/oHistory Loss
            kevinrs5855 Hot Shot
            Mirko,

                Compatible Groups wouldn't work as a workaround for my situation.  Basically, suppose your hosting a server for customer C1 on platform P1.  If you decide to move the C1 server to Platform P2 then all of the statistics for C1 while it was being hosted on P1 are not visible on the newly discovered server on Platform P2.  Not only that but server C1 on platform P1 is reported as currently down.

                My current work around for this is to ignore instances that are reported as down (we have other software tracking up time and outages) and to write a groovy script to automatically go in a delete all of the instances that have been down for more then 30 days.  I haven't gotten to creating this script yet, but I am hoping it won't be too difficult to write.

            Thanks for the reply and suggestion.  When I write the script I'll post it for others in case anyone stumbles upon this same issue.
            • 3. Re: Support for Moving Servers / Services between Platforms w/oHistory Loss
              excowboy Master
              Hi Kevin,

              I think it's not that easy to make such changes, because of the Hyperic HQ inventory model:
              http://support.hyperic.com/display/DOC/HQ+Inventory+and+Access+Model

              If you're monitoring a special JMX you're monitoring a special URL, right ? And I guess if you move C1 server to P2 you also move the service IP address of that URL ? I hope you're not using the loopback interface ..
              What about monitoring the URL from a dedicated host using that URL ? Would that be an option for you ?
              So you could at least monitor the JMX metrics without history loss.

              Cheers,
              Mirko
              • 4. Re: Support for Moving Servers / Services between Platforms w/oHistory Loss
                kevinrs5855 Hot Shot
                Mirko,

                   I thought more about what you said and understand why moving services doesn't make a ton of sense.  So as a follow up question, is their a faster way to bulk remove servers and or services from Hyperic.  Currently we have many down servers, and many down services.  I checked the web services API provided by Hyperic (hqapi) and it doesn't appear that the removing of resources is supported in the ResourceApi.  It appears the Groovey API  org.hyperic.hq.hqu.rendit.metaclass.ResourceCategory.remove(Resource r, AuthzSubject user) ; performs the logic necessary to remove a resource from being monitored.  Is this the suggested way to setup an automated way to delete large amounts of services and servers?  Should I just create a groovey plugin that I can pass HTTP parameters too in order to delete the resources I want to remove?  Thanks in advance for your help.
                • 5. Re: Support for Moving Servers / Services between Platforms w/oHistory Loss
                  kevinrs5855 Hot Shot
                  In case this might help others, The following are two groovey scripts I hacked together to solve my problem of deleting all of the monitored resources that appear to be 'down' because they were moved to a different server or they no longer exist.  I took the  approach of saying anything that is listed as being down for a week is most likely no longer a valid resource to monitor.

                  Here is my script to return the numbers of Resouces based on how long they have been down.
                  <code>
                  import org.hyperic.hq.measurement.server.session.AvailabilityManagerEJBImpl as AvailabilityManager
                  import org.hyperic.hq.appdef.shared.AppdefEntityID
                  import org.hyperic.hq.authz.server.session.AuthzSubject

                  // get the availability manager and a list of unavailable resources.
                  def availManager = AvailabilityManager.one;
                  def unavailEntities = availManager.getUnavailEntities(null);

                  // get all the resources that have been down for 30 days worth of milliseconds (2592000000 )
                  def entitiesToDelByMonth = unavailEntities.findAll { it.getDuration() > 2592000000 };

                  // get all the resources that have been down for 2 weeks worth of milliseconds (1209600000)
                  def entitiesToDelByTwoWeeks = unavailEntities.findAll { it.getDuration() > 1209600000};

                  // get all the resources that have been down for 1 week worth of milliseconds (604800000)
                  def entitiesToDelByOneWeek = unavailEntities.findAll { it.getDuration() > 604800000};

                  return "Total Unavailable Resources:" + unavailEntities.size() +
                  ", Resources down for the past month:" + entitiesToDelByMonth .size() +
                  ", Resources down for the past two weeks:" + entitiesToDelByTwoWeeks .size() +
                  ", Resources down for the past week:" + entitiesToDelByOneWeek.size();
                  "/code>

                  Here is my script to actually delete the Resources.
                  <code>
                  import org.hyperic.hq.hqu.rendit.util.HQUtil
                  import org.hyperic.hq.measurement.server.session.AvailabilityManagerEJBImpl as AvailabilityManager
                  import org.hyperic.hq.authz.server.session.ResourceManagerEJBImpl as ResMan
                  import org.hyperic.hq.appdef.shared.AppdefEntityID
                  import org.hyperic.hq.authz.server.session.AuthzSubject

                  // get the root user.
                  def overlord = HQUtil.overlord;

                  // get the services we are invoking.
                  def availManager = AvailabilityManager.one;
                  def resMan = ResMan.one;

                  // Get a list of all the unavailable resource measurements.
                  def unavailEntities = availManager.getUnavailEntities(null);

                  // create a list of entities that are down for at least 2 weeks, 1209600000 is 2 weeks of milliseconds.
                  def entitiesToDel = unavailEntities.findAll { it.getDuration() > 1209600000};

                  // convert this list to a list of entitiy Ids.
                  def idsToDel = entitiesToDel.collect{ it.getEntityId() };

                  // break the entitiy ids into 1000 groups. (change this based on total number of entities // being deleted.  About 5 entries per group seems to be the sweet spot.
                  int counter = 0;
                  def groupsOfIdsToDel = idsToDel.groupBy { counter++ % 1000};

                  // delete each group.
                  groupsOfIdsToDel.values().each
                  {
                    AppdefEntityID[] toRemove = new AppdefEntityID[it.size()];
                    for(def i=0; i<it.size(); i++){
                       toRemove[ i ]  =  it[i];
                     }
                    resMan.removeResources(overlord,toRemove);
                  };
                  </code>

                  Mirrko if you think there belong on the wiki or if they are a bad idea / bad practice please let me know.  I basically learned groovey to write these so I am sure they aren't the 'best' way to do things but they work (ever so slowly).